Skip to main content

A novel multiscale cGAN approach for enhanced salient object detection in single haze images

Abstract

In computer vision, image dehazing is a low-level task that employs algorithms to analyze and remove haze from images, resulting in haze-free visuals. The aim of Salient Object Detection (SOD) is to locate the most visually prominent areas in images. However, most SOD techniques applied to visible images struggle in complex scenarios characterized by similarities between the foreground and background, cluttered backgrounds, adverse weather conditions, and low lighting. Identifying objects in hazy images is challenging due to the degradation of visibility caused by atmospheric conditions, leading to diminished visibility and reduced contrast. This paper introduces an innovative approach called Dehaze-SOD, a unique integrated model that addresses two vital tasks: dehazing and salient object detection. The key novelty of Dehaze-SOD lies in its dual functionality, seamlessly integrating dehazing and salient object identification into a unified framework. This is achieved using a conditional Generative Adversarial Network (cGAN) comprising two distinct subnetworks: one for image dehazing and another for salient object detection. The first module, designed with residual blocks, Dark Channel Prior (DCP), total variation, and the multiscale Retinex algorithm, processes the input hazy images. The second module employs an enhanced EfficientNet architecture with added attention mechanisms and pixel-wise refinement to further improve the dehazing process. The outputs from these subnetworks are combined to produce dehazed images, which are then fed into our proposed encoder–decoder framework for salient object detection. The cGAN is trained with two modules working together: the generator aims to produce haze-free images, whereas the discriminator distinguishes between the generated haze-free images and real haze-free images. Dehaze-SOD demonstrates superior performance compared to state-of-the-art dehazing methods in terms of color fidelity, visibility enhancement, and haze removal. The proposed method effectively produces high-quality, haze-free images from various hazy inputs and accurately detects salient objects within them. This makes Dehaze-SOD a promising tool for improving salient object detection in challenging hazy conditions. The effectiveness of our approach has been validated using benchmark evaluation metrics such as mean absolute error (MAE), peak signal-to-noise ratio (PSNR), and structural similarity index measure (SSIM).

1 Introduction

Many contemporary applications heavily rely on the analysis of visual content to identify schemas and make informed decisions. Prominent examples of such applications include smart monitoring systems, tracking systems, and automation systems, where acquiring high-quality images or videos is crucial for achieving precise outcomes and ensuring dependable performance. However, these systems are susceptible to adverse effects caused by environmental factors, with haze and smog being the most prevalent. Vision science researchers have conducted detailed studies to counteract the detrimental influence of these environmental conditions on visual analysis through image dehazing. The connection between a haze-free, unaltered image and a hazy image can be explained using a straightforward physical model referred to as the “haze model” or “atmospheric scattering model”. This model relates the observed image with haze (Lobserved) to the original non-hazy scene radiance (Lscene) as:

$$\begin{aligned} L_{\text {observed}}(a, b) = L_{\text {scene}}(a, b) * t(a, b) + A * (1 - t(a, b)), \end{aligned}$$
(1)

where:

  • Lobserved(ab) corresponds to the pixel value at position (ab) in the hazy image.

  • Lscene(ab) denotes the pixel value at position (ab) in the original, non-hazy image.

  • t(ab) is the transmission map, indicating the fraction of light that travels from the scene to the camera. It is a value between 0 and 1 and is typically lower in hazy areas.

  • A is the atmospheric light, representing the overall light scattered by the atmosphere and received by the camera.

  • The objective of dehazing, or removing haze from an image, is to estimate the transmission map \(t(a, b)\) and atmospheric light \(A\), allowing for the recovery of the original scene radiance \(L_{\text {scene}}\).

The equation models the formation of a hazy image by combining two components: the direct transmission of light from the scene and the atmospheric scattering of light. The first term, \(L_{\text {scene}}(a, b) * t(a, b)\), represents the direct transmission of light from the scene to the camera, attenuated by the transmission map \(t(a, b)\). In hazy conditions, this term is reduced, leading to lower contrast and visibility. The second term, \(A * (1 - t(a, b))\), accounts for the light scattered by atmospheric particles, which adds a layer of haze to the image. The challenge in image dehazing is to accurately estimate \(t(a, b)\) and \(A\) so that the original scene radiance \(L_{\text {scene}}(a, b)\) can be reconstructed, effectively removing the haze and restoring image clarity.

Various dehazing algorithms exist to estimate these parameters and perform haze removal. Dehazing techniques are primarily categorized as two significant groups: methods that rely on prior information and methods that rely on learning.

Prior-based methods [1,2,3] depend on haze-related characteristics to enhance clarity in hazy images. However, these approaches often impose strict limitations, which can lead to artifacts in the reconstructed images. For instance, He et al. [2] assume that at least one RGB channel is close to zero in a clean natural image, a condition that may not hold true when contextual objects mimic atmospheric light, such as in cases with snow or sky.

In recent developments, learning-based methods [4, 5] have gained prominence in image dehazing. Cai et al. [6] introduced DehazeNet, an end-to-end network that directly learns the complex relationship between hazy images and their transmission maps. Li et al. [7] took a different approach by redefining the atmospheric scattering model and employing a lightweight Convolutional Neural Network (CNN) to restore hazy images directly, thereby eliminating the need to independently estimate atmospheric light and transmission maps. Ren et al. [8] proposed an innovative multiscale CNN that extracts pertinent features directly from hazy images to estimate the transmission map. This network comprises coarse-scale and fine-scale components, yielding promising results in dehazing applications. In [9], a method known as DCPDN was introduced for single-image dehazing. This method simultaneously acquires atmospheric light information, transmission map details, and dehazing process parameters by directly incorporating Eq. (1) into the refinement module.

Advancements in image processing technology have significantly influenced the evolution of image dehazing approaches. Initially, image dehazing was approached primarily as an image enhancement challenge, with methods like contrast improvement and Retinex employed to mitigate foggy conditions. Subsequently, researchers turned to image restoration methods rooted in atmospheric scattering models to tackle the dehazing problem. Notable methods include Dark Channel Prior [2] and Non-local Dehazing [10]. Given the complexity of the image dehazing task, researchers began combining multiple techniques to enhance overall performance, leading to the development of fusion-based methods.

In recent years, the rapid progress in deep learning technology has ushered in a new era, marked by the introduction of numerous deep learning-based dehazing algorithms such as DehazeNet [6], AOD-Net [7], and FFA-Net [11]. Compared to earlier approaches, deep learning-based algorithms have demonstrated significant improvements in both dehazing effectiveness and robustness. The existing literature on salient object detection predominantly focuses on clear and non-hazy images, with limited research specifically addressing salient object detection in single hazy images. As of now, no dedicated model simultaneously tackles the complexities of hazy image restoration and salient object detection. Current methodologies often focus on either hazy image dehazing or salient object detection individually, leaving a gap in the literature for an integrated solution that effectively handles both tasks in a unified way. In this paper, we introduce DehazeSOD, an approach centered on a conditional Generative Adversarial Network (cGAN) consisting of two distinct subnetworks. The first subnetwork, acting as a discriminator, evaluates the clarity of images and distinguishes between hazy and haze-free versions. Simultaneously, the second subnetwork functions as a dedicated saliency detector, identifying prominent objects within the haze-free images. These subnetworks operate collaboratively and competitively, leveraging the principles of adversarial learning. Their synergistic interaction results in the generation of highly accurate saliency maps, demonstrating exceptional performance in salient object detection.

Our research work contributes the following:

  • Introduction of Dehaze-SOD framework: A novel integrated model called Dehaze-SOD is proposed, which simultaneously addresses the challenges of image dehazing and salient object detection within a unified framework.

  • Dual-functionality approach: The key innovation of Dehaze-SOD lies in its ability to effectively perform both dehazing and salient object detection, even in complex scenarios involving haze, low lighting, and cluttered backgrounds.

  • cGAN-based architecture: The model utilizes a cGAN that consists of two distinct subnetworks: one dedicated to image dehazing using advanced techniques such as residual blocks, Dark Channel Prior, total variation, and multiscale Retinex algorithm, and another for salient object detection, enhanced by an improved EfficientNet with attention mechanisms.

A discussion of the literature review is provided in Sect. 2. In Sect. 3, the proposed DehazeSOD method is explained. Details regarding the datasets, evaluation metrics, experimental setup, and comparative studies with qualitative and quantitative analyses are available in Sect. 4. The ablation study conducted is discussed in Sect. 5. The key findings are outlined in Sect. 6.

2 Related work

Due to advancements in deep learning, image dehazing techniques have demonstrated enhanced performance. Image dehazing methods that rely on deep learning typically learn from a significant number of paired images, both hazy and haze-free. This knowledge is then used to extract the intricate characteristics of hazy images through a CNN, enabling the identification of the correspondence between images affected by haze and those that are clear. Compared to conventional methods, approaches based on deep learning tend to excel in terms of both effectiveness and efficiency. This section presents various noteworthy methods that use deep learning for dehazing.

DehazeNet is a trainable end-to-end system for single image haze removal that uses a convolutional neural network-based deep architecture to estimate the medium transmission map of a hazy image [6]. The authors of [8] introduced a multiscale deep neural network (MSCNN) that addresses the challenge of single-image dehazing by leveraging an atmospheric scattering model. Similar to DehazeNet, the MSCNN algorithm is designed to understand the correlation among hazy images and their respective transmission maps. In 2019, Ren et al. [12] introduced an improved iteration of the MSCNN technique, demonstrating significant enhancements in edge preservation.

Zhang et al. introduced the DCPDN, which operates on an atmospheric scattering model. The architecture includes an encoder–decoder structure that preserves edges and incorporates a multi-level pyramid pooling module inspired by [13] to estimate the transmission map. Besides the aforementioned techniques, numerous other dehazing methods utilize ASM as a fundamental component. Examples include ABC-Net [14] and LATPN [15], which, like DehazeNet and MSCNN, directly generate transmission maps. On the other hand, PMHLID [16], similar to DCPDN, employs a joint learning approach for both transmission maps and atmospheric light. Furthermore, FAMED-Net [16,17,18] incorporates ASM into the network in a non-explicit manner, resembling the characteristics of D4.

In the present era, numerous ASM model networks have showcased substantial advancements in the field of single-image dehazing. Upon receiving an input image with haze, diverse methodologies can be employed to achieve haze removal, including the utilization of an encoder–decoder structure, networks based on GANs, attention mechanisms, and knowledge transfer, among others.

The algorithm known as the Gated Fusion Network (GFN), proposed by Ren et al. [19], operates as an end-to-end solution. The Gated Context Aggregation Network (GCANet), proposed by Chen et al. [20], aims to address the challenges of fog and rain removal simultaneously. The novel contribution of GCANet lies in the incorporation of dilated convolution into the image dehazing network, which enhances its effectiveness. Due to the remarkable feature extraction capability of the encoder–decoder structure, numerous outstanding networks for dehazing also adopt the encoder–decoder network as their foundation. AECR-Net [21] integrates the encoder–decoder-like network with contrastive learning, showcasing remarkable performance in both artificially synthesized and real-world datasets. MSBDN-DFF [22] constructs an augmented encoder–decoder network. EDN-GTM [23] implements the application of a guided transmission map in the encoder–decoder dehazing network.

The single-image dehazing network known as Cycle-Dehaze, proposed by Engin et al. [24], is an advanced iteration of the CycleGAN architecture [25]. This approach offers a significant benefit in that there is no requirement to estimate the atmospheric scattering model’s parameters. Singh et al. [26] proposed a novel method using a generative adversarial network named Back-Projected Pyramid Network (BPPNet). Several image dehazing networks make use of Generative Adversarial Networks (GANs). Certain networks incorporate a physical model or other prior knowledge to facilitate the training of GANs, as demonstrated by Pan et al. [27], HardGAN [28], and SA-cGAN [29].

Because the task of SOD involves transforming images, our approach primarily relies on a cGAN [30]. The GAN model achieves desirable results by employing adversarial learning between two key components within its framework: the generator (G) and the discriminator (D). In the realm of SOD tasks, the D is developed to identify the disparity between the estimated haze-free generated image and the actual haze-free image. Conversely, the G aims to generate a binary segmentation map that closely resembles the ground truth, attempting to deceive the D. These two components collaborate to produce more accurate segmentation maps.

A multi-exposure fusion framework proposed by Kumar et al. [31] enhances the contrast of hazy images by synthesizing multi-exposure images to improve visibility in challenging environments. It uses techniques like gamma correction, adaptive histogram equalization, and structural patch decomposition-based fusion to boost local visibility and global contrast. Additionally, a joint gamma correction and multi-resolution fusion scheme was proposed by Kumar et al. [32] for enhancing haze-degraded images. The survey by Babu et al. [33] thoroughly examines state-of-the-art methods for haze-free images, primarily focusing on developments from the last decade, and systematically summarizes the real-time hardware implementations of various haze removal methods. A dynamic stochastic resonance (DSR)-based technique in the spatial domain has been proposed by Chouhan et al. [34] for the enhancement of dark and low-contrast images.

3 Proposed method

Our proposed architecture, outlined in Fig. 1, is a state-of-the-art deep learning model for SOD in hazy images. It consists of two subnetworks, each containing three modules: a Residual-Enhanced Dehazing Module, an Enhanced EfficientNet Module, and a Salient Object Detection Module.

Generator: The generator incorporates a Residual-Enhanced Dehazing Module and an Enhanced EfficientNet Module. The Residual-Enhanced Dehazing Module processes the haze-free image through the following steps: we begin by applying residual blocks [35], and the output of these residual blocks is fed into the DCP to estimate the initial transmission map of the hazy image. Residual blocks offer several advantages in image dehazing tasks. They enhance feature representation by capturing fine details and subtle variations, which is particularly useful for regions with varying haze densities. The non-linear mapping capabilities of residual blocks empower the model to learn complex input–output relationships, effectively removing haze artifacts in challenging scenarios. Additionally, residual blocks improve the model’s adaptability to different haze densities and distributions, ensuring consistent performance across various real-world situations. The inclusion of identity skip connections within these blocks helps overcome the vanishing gradient problem, facilitating stable and efficient gradient propagation during training. This leads to more effective learning and removal of haze artifacts, even in non-uniform haze distributions. The DCP exploits the statistical regularity of outdoor haze-free images to estimate the amount of haze. Next, we refine this initial estimate using the Total Variation (TV) method, a powerful tool for noise reduction and edge preservation, which aids in improving the accuracy of the transmission map. We then apply the Multi-Scale Retinex (MSR) [36] algorithm to the TV-refined transmission map. The MSR algorithm adjusts the illumination at different scales, further refining the dehazing result by enhancing the transmission map. To improve the performance of our model, we incorporated an EfficientNet attention module with a self-attention mechanism, a pixel-wise refinement module, and batch normalization. These components handle tasks such as feature extraction, attention mechanism, refining the output at a pixel level, and normalizing the activations of the model, respectively.

The output of the Residual-Enhanced Dehazing Module is combined with the Enhanced EfficientNet Module, where the result is evaluated by the discriminator to determine whether the recovered image is real or fake. The DCP is used to estimate the medium transmission t(x). The DCP is based on the observation that in non-sky regions, certain pixels often exhibit low intensities in at least one color channel. It is described as follows:

$$\begin{aligned} \text {DCP}(I)(y) = \min _{x \in \omega (y)} \left( \min _{c \in {r,g,b}} \left( \frac{I^c(x)}{A^c} \right) \right) , \end{aligned}$$
(2)

where \(I^c\) represents the color channel of the hazy image. \(A^c\) represents the color channel of the atmospheric light. \(\omega (y)\) is a local patch centered at y. This equation defines the Dark Channel Prior (DCP) for an image \(I\). The DCP is calculated by finding the minimum value within a local patch \(\omega (y)\) for each color channel \(c\) (red, green, and blue) of the image, normalized by the corresponding atmospheric light \(A^c\). The DCP helps identify regions with low intensity in at least one color channel, which are typically affected by haze, making it an effective method for estimating the transmission map in dehazing algorithms. The total variation (TV) is a method used for noise reduction and image smoothing. The process performs such by reducing the variation of the image in total, which is the cumulative addition of the absolute differences between the intensities of the pixels. For an image, the TV is calculated as:

$$\begin{aligned} \text {TV}(u) = \int \left| \nabla u \right| dx, \end{aligned}$$
(3)

where \(\text {TV}(u)\) represents the total variation of the image u. \(\nabla u\) represents the gradient of the image, and the integral is over the entire image. This equation quantifies the smoothness of the image by measuring the magnitude of the gradient, which corresponds to changes in pixel intensity. Minimizing the TV helps in reducing noise while preserving important edges in the image, making it a powerful technique for image restoration tasks like dehazing. Refining the transmission map by adjusting the illumination at different scales can be achieved through a process known as Multi-Scale Retinex (MSR):

$$\begin{aligned} \text {MSR}(a,b) = \sum _{m=1}^{N} w_m \cdot \text {SSR}_m(a, b), \end{aligned}$$
(4)

where \(\text {MSR}(a,b)\) is the output of the MSR algorithm, \(w_m\) is the weighted value, \(N\) is the number of scales, \(\text {SSR}_m(a,b)\) denotes the reflection image at the \(m\)th scale. The MSR algorithm enhances the visibility of images by combining the effects of multiple scales of the Single-Scale Retinex (SSR) algorithm. Each scale captures different levels of detail, and the weighted sum of these scales produces the final enhanced image. The MSR helps to refine the transmission map by improving local contrast and balancing the illumination, making it a key step in the dehazing process.

Enhanced efficient net module: The hazy image serves as input to a convolutional block, followed by batch normalization. This block typically comprises convolutional layers intended to capture fundamental image characteristics, which are then utilized as input for subsequent stages of the model. After the convolutional block, the extracted features are passed through an enhanced EfficientNet model. EfficientNet [37] is a convolutional neural network architecture that uses a compound scaling method to uniformly scale the depth, width, and resolution of the network. In this case, three blocks of EfficientNet are used. The objective of this phase is to derive more complex, high-level characteristics from the image.

Following the EfficientNet phase, a self-attention mechanism is implemented. This module allows the model to weigh the significance of distinct features in the image, enabling it to concentrate on the most relevant aspects for the dehazing task. The output from the self-attention stage is then refined at the pixel level, which may involve various approaches to improve the quality of the dehazed image.

To enhance the overall effectiveness and stability of the cGAN training process, we considered applying the tanh activation function to the generator’s output. This helps constrain the pixel values of the generated images to a specific range, facilitating stable training and ensuring compatibility with the discriminator’s output.

Discriminator: The discriminator in DehazeSOD is composed of four convolutional layers, each designed to extract increasingly complex features from the input pair of images (real or fake haze-free images conditioned on the hazy image). After each convolutional layer, a Leaky ReLU activation function is applied to introduce non-linearity, which helps the network learn complex patterns while allowing a small gradient for inactive units, thereby avoiding dead neurons during training. To further enhance the training process, batch normalization is performed after each convolutional layer. This normalization helps stabilize and accelerate the training by reducing internal covariate shifts. Finally, the output of the convolutional layers is passed through fully connected layers, which combine the extracted features to make the final binary classification, determining whether the input image pair is real or fake. This architecture ensures that the discriminator is robust in distinguishing between real and generated images, conditioned on the hazy inputs. The discriminator takes two types of input: the real haze-free image conditioned on the hazy input image, and the generated haze-free image also conditioned on the same hazy input image. Its task is to distinguish between these two scenarios:

  • Real data loss: The first term \(\frac{1}{N} \sum _{j=1}^{N} \log \left( D(y_j \mid x_j)\right)\) computes the average binary cross-entropy loss for real haze-free images. This measures how well the discriminator can identify real haze-free images from the training set when conditioned on their corresponding hazy images.

  • Fake data loss: The second term \(\frac{1}{N} \sum _{j=1}^{N} \log \left( 1 - D(G(z_j \mid x_j))\right)\) computes the average binary cross-entropy loss for fake haze-free images generated by the generator. This term measures how well the discriminator can detect that the images generated by the generator are fake when conditioned on the hazy input images.

The generator and discriminator are iteratively trained in such a way that the probability values predicted by the discriminator approach 1 for real images and 0 for generated (fake) images. The generator’s goal is to produce haze-free images that are increasingly realistic, making it difficult for the discriminator to correctly classify them as fake. Conversely, the discriminator is trained to accurately distinguish between real and generated images, with the aim of maximizing the probability difference between them. This adversarial process continues until the generated images become nearly indistinguishable from real images, effectively fooling the discriminator.

Fig. 1
figure 1

Depicts a visual representation of the proposed method

Salient object detection module:

The proposed module is an encoder–decoder architecture inspired by the U-Net model [38,39,40], featuring unique modifications and improvements that make it exceptionally suitable for salient object detection. The architecture of the encoder and decoder is explained below:

Encoder:

The convolution operation between an image and a kernel is represented by the following equation:

$$\begin{aligned} Y_{ji} = \sum _{n} \sum _{m} \text{Img}_{(j-n)(i-m)} \cdot K_{nm}, \end{aligned}$$
(5)

where:

  • \(Y_{ji}\) is the output value at position \((j, i)\) in the output image \(Y\).

  • \(\text{Img}_{(j-n)(i-m)}\) is the pixel value of the input image Img at the position \((j-n, i-m)\).

  • \(K_{nm}\) is the value of the kernel at the position \((n, m)\).

In this equation:

  • \(j-n\) represents the row index of the image pixel relative to the current position \(j\) in the output image.

  • \(i-m\) represents the column index of the image pixel relative to the current position \(i\) in the output image.

These indices (\(j-n\) and \(i-m\)) effectively shift the position on the image Img by \(n\) and \(m\), respectively, as the kernel slides over the image. This shifting allows the kernel to be applied at each position, computing the weighted sum of the pixel values covered by the kernel, with the weights given by the kernel values. Each of these operations is followed by batch normalization and activation functions. These layers process the input image and progressively derive hierarchical features. In this architecture, the first convolutional layer takes input images of size \(256 \times 256\) with a single channel. It applies a 2D convolution operation with 64 filters of size \(3 \times 3\). Subsequent convolutional layers maintain the same filter size and channel count, effectively increasing the depth of feature extraction.

Batch normalization is represented by the following equation:

$$\begin{aligned} Y_{i} = \frac{X_{i} - \mu _{B}}{\sqrt{\sigma _{B}^{2} + \epsilon }} \cdot \gamma + \beta , \end{aligned}$$
(6)

where:

  • \(X_{i}\) denotes the input to the batch normalization layer,

  • \(\mu _{B}\) refers to the batch mean,

  • \(\sigma _{B}^{2}\) represents the batch variance,

  • \(\epsilon\) represents a small constant for numerical stability, and

  • \(\gamma\) and \(\beta\) are learnable parameters of the layer.

This equation normalizes the input \(X_{i}\) by subtracting the batch mean \(\mu _{B}\) and dividing by the square root of the batch variance \(\sigma _{B}^{2}\) plus a small constant \(\epsilon\). The result is then scaled and shifted by the learnable parameters \(\gamma\) and \(\beta\), respectively. This normalization helps in stabilizing and accelerating the training process. The convolutional layers contribute to learning local patterns, edges, and textures in the input image. To capture complex relationships within the data, emphasize important features, and ultimately enhance the accuracy of segmentation tasks, we used multiplication operations in conjunction with dense layers for hierarchical feature fusion. Additionally, in the attention mechanisms, these operations dynamically adjust the importance of each channel. After each convolutional layer, batch normalization is applied to normalize the activations, helping to accelerate the training process and improve convergence. Rectified Linear Unit (ReLU) activation functions are applied after batch normalization to introduce non-linearity, allowing the network to learn complex relationships. Dropout layers are added following the activation functions to mitigate overfitting by randomly deactivating neurons during training. The spatial dimensions of the feature maps are progressively reduced through max-pooling layers with a 2 \(\times\) 2 window and a stride of 2, which downsamples the feature maps by a factor of 2. The global average pooling operation takes the feature maps from the encoder and computes the average value along each channel, resulting in a global representation with reduced spatial dimensions. This operation helps aggregate important information while reducing the risk of overfitting. The compressed representation obtained from global average pooling is then passed through a series of dense layers, further decreasing the dimensionality of the data and resulting in a concise yet informative feature vector.

Decoder: The decoder is responsible for reconstructing the original spatial dimensions from the compressed representation and generating the final output. It comprises transposed convolutional layers and skip connections from the encoder. The decoder utilizes a sequence of transposed convolutional layers to gradually increase the resolution of the compressed representation. These layers apply deconvolutional operations, effectively expanding the spatial dimensions of the feature maps while reducing the number of channels. Specific filter sizes and strides are not provided in the architecture. The decoder incorporates skip connections by concatenating feature maps from the encoder at corresponding layers. These connections enable the decoder to access high-resolution features from the encoder, aiding in accurate reconstruction. The decoder performs convolutional transpose operations to upsample the feature maps and concatenates them with the skip connections, which helps merge detailed features from the encoder. The final layers of the decoder include further convolutional operations, batch normalization, activation functions, and a convolutional layer with a single channel to produce the reconstructed output. This output has the same dimensions as the original input, specifically 256 \(\times\) 256 pixels, and consists of a single channel.

3.1 Loss functions

Generator loss:

The loss function in our model includes three different types of loss: adversarial loss, mean squared error (MSE) loss, and total variation loss. The combined loss function can be expressed as:

$$\begin{aligned} \text {Adversarial loss} + \lambda _{\text {MSE}} \times \text {MSE Loss} + \lambda _{\text {TV}} \times \text {total variation loss}, \end{aligned}$$

where:

  • Adversarial loss: This term encourages the generated images to be indistinguishable from real images by a discriminator network. It is often used in cGANs to improve the realism of the generated images by forcing the generator to create outputs that are increasingly closer to the real distribution.

  • \(\lambda _{\text {MSE}}\): This is a weighting factor for the MSE Loss. It controls the contribution of the MSE Loss to the total loss, allowing for a balance between pixel-level accuracy and other aspects of the generated image.

  • MSE loss: Mean squared error loss is given by:

    $$\begin{aligned} \text {MSE Loss} = \frac{1}{N} \sum _{i=1}^{N} (y_i - \hat{y}_i)^2, \end{aligned}$$

    where \(y_i\) represents the true values, \(\hat{y}_i\) represents the predicted values, and \(N\) is the number of pixels in the image. The MSE Loss measures the average squared difference between the predicted values and the actual values, promoting accuracy in the reconstruction of the images by penalizing large deviations.

  • \(\lambda _{\text {TV}}\): This is a weighting factor for the total variation loss, which controls its influence on the overall loss function. A higher \(\lambda _{\text {TV}}\) places more emphasis on smoothness and edge preservation in the generated images.

  • Total variation loss: This term helps in reducing noise and preserving edges in the generated images by penalizing rapid intensity changes between neighboring pixels. It is particularly useful for creating images that are smooth while maintaining important structural details, such as edges. The Total Variation Loss can be expressed as:

    $$\begin{aligned} \text {Total variation loss} = \sum _{i,j} \left( (x_{i+1,j} - x_{i,j})^2 + (x_{i,j+1} - x_{i,j})^2 \right) , \end{aligned}$$

    where \(x_{i,j}\) represents the pixel value at position \((i,j)\) in the generated image. This equation sums the squared differences in pixel intensities across the horizontal and vertical directions, penalizing abrupt changes and encouraging smoothness.

By combining these losses, the model aims to generate high-quality images that are both accurate and visually pleasing, balancing realism, accuracy, and smoothness.

The adversarial loss is computed by evaluating the probabilities assigned by the discriminator \(D\) to the reconstructed images \(G(I_{\text {hazy}})\) across all training samples. This loss measures how well the generator \(G\) can fool the discriminator into believing that the generated images \(G(I_{\text {hazy}})\) are real, thereby encouraging the generator to produce more realistic and convincing outputs.

The adversarial loss function used in our model is defined as:

$$\begin{aligned} L_{\text {adv}}&= -\sum _{n=1}^{N} \log D(G(I_{\text {hazy}_n})), \end{aligned}$$
(7)

where:

  • \(L_{\text {adv}}\): This represents the adversarial loss.

  • \(N\): The total number of samples in the dataset.

  • \(I_{\text {hazy}_n}\): The \(n\)th hazy input image.

  • \(G(I_{\text {hazy}_n})\): The generator \(G\) output, which is the generated image from the \(n\)th hazy input image.

  • \(D(G(I_{\text {hazy}_n}))\): The discriminator \(D\) output, which predicts the probability that the generated image \(G(I_{\text {hazy}_n})\) is a real image.

In this adversarial loss function:

  • The generator \(G\) tries to create images from the hazy inputs \(I_{\text {hazy}_n}\) that are indistinguishable from real images.

  • The discriminator \(D\) attempts to correctly identify whether an image is real or generated.

  • The term \(-\log D(G(I_{\text {hazy}_n}))\) penalizes the generator for producing images that the discriminator \(D\) can easily identify as fake. The generator aims to minimize this loss, which corresponds to maximizing the discriminator’s output \(D(G(I_{\text {hazy}_n}))\).

By minimizing the adversarial loss \(L_{\text {adv}}\), the generator improves its ability to produce realistic images from hazy inputs, effectively fooling the discriminator into classifying the generated images as real.

MSE loss

\(L_2\) Loss corresponds to the MSE loss, quantifying the pixel difference between the generated images and the clear images. It calculates the mean squared disparity between the predicted values and the actual values. In the context of image generation or reconstruction, it is often employed to measure the difference between the generated/reconstructed image and the clear image. The mean squared error (MSE) loss function used in our model is defined as:

$$\begin{aligned} \text {MSE}\_\text{Loss} = \frac{1}{N} \sum _{i=1}^{N} (I_{\text {clear}_i} - G(I_{\text {hazy}_i}))^2, \end{aligned}$$
(8)

where:

  • \(\text {MSE}\_\text{Loss}\): This represents the mean squared error loss.

  • \(N\): The total number of samples in the dataset.

  • \(I_{\text {clear}_i}\): The \(i\)th clear (ground truth) image.

  • \(G(I_{\text {hazy}_i})\): The generator \(G\)’s output, which is the generated image from the \(i\)th hazy input image \(I_{\text {hazy}_i}\).

In this MSE loss function:

  • The term \((I_{\text {clear}_i} - G(I_{\text {hazy}_i}))^2\) represents the squared difference between the \(i\)th clear image and the generated image.

  • The sum \(\sum _{i=1}^{N}\) aggregates these squared differences over all \(N\) samples in the dataset.

  • Dividing by \(N\) computes the average of these squared differences, resulting in the mean squared error.

The MSE loss measures the average squared difference between the true clear images and the images generated by the model. By minimizing the MSE loss, the generator improves its accuracy in reconstructing the clear images from the hazy inputs, ensuring that the generated images are as close as possible to the ground truth.

Discriminator loss: The overall loss of the discriminator in a cGAN is the combination of the real data loss and the fake data loss. In this context, the CGAN is trained to differentiate between real haze-free images and those generated by the model from hazy images. To calculate the loss of the discriminator, we use the BCELoss (binary cross-entropy) loss function.

In mathematical terms, the loss of the discriminator (\(\mathcal {L}_{\text {D}}\)) can be formulated as:

$$\begin{aligned} \mathcal {L}_{\text {D}} = -\left( \frac{1}{N} \sum _{j=1}^{N} \log \left( D(y_j \mid x_j)\right) + \frac{1}{N} \sum _{j=1}^{N} \log \left( 1 - D(G(z_j \mid x_j))\right) \right) , \end{aligned}$$
(9)

where:

  • \(D(y_j \mid x_j)\) represents the discriminator’s output for a real haze-free image \(y_j\) conditioned on the corresponding hazy image \(x_j\).

  • \(D(G(z_j \mid x_j))\) represents the discriminator’s output for a generated (fake) haze-free image \(G(z_j)\) conditioned on the same hazy image \(x_j\).

  • \(N\) is the batch size.

  • \(\log\) is the natural logarithm.

4 Results and discussion

In this section, we first outline the datasets utilized to assess the efficacy of the method we proposed. Following that, our experimental setup, including implementation specifics, evaluation metrics, state-of-the art comparison are explained.

4.1 Datasets

In the realm of image processing, there is a notable gap in the availability of a comprehensive dataset that integrates hazy images with corresponding salient object ground truth. To address this gap, we employed the NH-Haze [41] dataset to evaluate the haze removal task and the ECSSD [42] dataset for the salient object detection task.

  • DENSE-HAZE [43]: The DENSE-HAZE dataset is a specialized collection of images developed for the NTIRE 2019 challenge, aimed at addressing single-image dehazing under dense fog conditions. It consists of 45 training images, 5 validation images, and 5 testing images. In our work, we utilize the official training and validation sets to train our model. The DENSE-HAZE dataset offers a realistic and challenging benchmark, crucial for advancing state-of-the-art image dehazing techniques.

  • NH-Haze: On the other hand, The NH-HAZE [41] dataset was created for the NTIRE 2020 challenge, focusing on non-homogeneous haze removal in images. It contains 55 pairs of high-resolution outdoor images, each with a corresponding haze-free ground truth image.

  • ECSSD [44]: This dataset is specifically designed to enhance image segmentation and facilitate research in complex scene saliency. Each image in the ECSSD dataset is matched with a corresponding ground-truth mask. The salient object detection module undergoes evaluation using this dataset. The dataset includes several images with complex scenes, varied textures, and low-contrast colors, with a total of 1000 natural images used for analysis.

4.2 Experimental setup

Although each dataset possesses distinct characteristics, we employ a uniform training strategy across all of them. This involves randomly cropping regions of size 256 \(\times\) 256 from the images. To enhance the training data, we introduce variability through random rotations and horizontal flips. The model is trained using a batch size of 16 and the Adam optimizer, with \(\beta _{1}\) set to 0.9 and \(\beta _{2}\) set to 0.999. During training, the learning rate is adjusted using a schedule function, which maintains a consistent rate for the first 10 epochs and then reduces it by a factor of 0.001. Initially, the learning rate is set to 0.001. The same optimizer and training strategies are applied to the discriminator. The model is implemented in Python 3 using TensorFlow, a popular framework in the field. Training and testing are performed on a single NVIDIA Tesla V100-SXM2 GPU.

4.3 Evaluation metrics

An extensive analysis and evaluation of our proposed approach was conducted on multiple standard datasets, contrasting its performance with state-of-the-art methods. The experimental results demonstrate significant enhancements achieved by our method in both qualitative and quantitative metrics, thereby substantiating its effectiveness. The model’s performance is assessed by comparing dehazed images with clear ground truth.

Peak signal-to-noise ratio (PSNR): PSNR [21] is defined via the mean squared error, which measures the average of the squares of the errors between the original and the reconstructed image. The equations are:

$$\begin{aligned} \text {MSE} = \frac{1}{xy}\sum _{i=0}^{x-1}\sum _{j=0}^{y-1}\left[I(i,j) - K(i,j)\right]^2, \end{aligned}$$
(10)

where I(ij) is the original image, K(ij) is the reconstructed image, and x and y are the dimensions of the images.

$$\begin{aligned} \text {PSNR} = 20 \cdot \log _{10}\left( \frac{\text {MAX}_I}{\sqrt{\text {MSE}}}\right) , \end{aligned}$$
(11)

where \(\text {MAX}_I\) is the maximum pixel value of the image. A higher PSNR indicates that the reconstruction is of higher quality.

Structural Similarity Index (SSIM): SSIM [21] is a technique employed to measure the similarity between two images. It is a full reference metric, meaning it compares the image quality by comparing an initial uncompressed or distortion-free image as a reference. It aims to enhance established metrics such as PSNR and mean squared error (MSE).

$$\begin{aligned} \text {SSIM}(i,j) = \frac{(2\mu _i\mu _j + c_1)(2\sigma _{ij} + c_2)}{(\mu _i^2 + \mu _j^2 + c_1)(\sigma _i^2 + \sigma _j^2 + c_2)}. \end{aligned}$$
(12)

In addition to these, we also considered the running time of the model and the visual quality of the dehazed images, where:

  • i and j are the two images being compared (clear and the dehazed image).

  • \(\mu _i\) and \(\mu _j\) are the average pixel intensities of images i and j, respectively.

  • \(\sigma _i^2\) and \(\sigma _j^2\) are the variances of images i and j, respectively.

  • \(c_1\) and \(c_2\) are two variables used to stabilize division with a weak denominator.

  • \(k_1 = 0.01 \text { and } k_2 = 0.03 \text { by default.}\)

Table 1 Quantitative comparisons of our method with other methods [45] and percentage differences in PSNR and SSIM values compared to other state-of-the-art methods
Table 2 Quantitative results of the proposed SOD module [44]
Table 3 Comparisons of state-of-the-art methods in terms of inference time for processing a single image of size 1600 \(\times\) 1200
Fig. 2
figure 2

Qualitative results of our method with others on NH-HAZE dataset

4.4 State-of-the-art comparisons

We evaluate our proposed method against several state-of-the-art techniques for both image dehazing and salient object detection using the datasets mentioned above. The comparison includes methods such as DCP [2], AOD-Net [7], GCANet [20], FFA [11], TDN [46], and DW-GAN [45].

Additionally, we assessed the performance of the salient object detection module using current state-of-the-art methods, calculating the mean absolute error (MAE) based on the method described in [44]. This includes BASNet [47], ITSD, DUMRN [44], EGNet [48], AFNet [49], F3Net [50], Gatenet [51], LDF [52], MINet [53], and PoolNet [54].

Qualitative results comparison: Figures 2 and 3 show the haze-free images and the results for salient object detection, respectively. The results for haze-free images produced by the DehazeSOD method are specifically highlighted in Figure 2. We observed that our method does not completely mimic the human visual system in detecting salient objects. It encounters difficulties when applied to images with shadows or when both the background and foreground are of similar colors. In images with complex backgrounds, shadows, or dense haze, the method’s performance may degrade due to several factors. Firstly, shadows introduce additional variations in brightness and contrast, which can mislead the method, resulting in inaccurate detection or segmentation. Secondly, dense haze reduces overall visibility and contrast, making it challenging for the method to identify key features or edges. Thirdly, when the foreground and background share similar colors, the method struggles to differentiate between them due to its reliance on color contrast for saliency detection. These factors collectively contribute to the method’s difficulty in accurately processing such images, leading to suboptimal outcomes, such as blurred boundaries, missed objects, or incorrect saliency estimation. Instances where our method failed to remove haze and detect salient objects are illustrated in Fig. 4.

The DCP method (results shown in the fourth column) tends to produce noticeably brighter results on the test set and more blue-tinted outcomes in real-world scenarios. AOD-Net (results shown in the fifth column) often exhibits significant color distortion and only partial haze removal in real-world scenarios. While GCANet and FFA (results shown in the sixth and seventh columns, respectively) outperform the aforementioned methods, they still struggle to effectively handle hazy areas. In the NH-HAZE dataset, FFA struggles to effectively remove haze and often generates unpleasant artifacts. Notably, TDN was the winning method in the NTIRE 2020 Non-Homogeneous Dehazing Challenge; however, it surprisingly does not yield satisfactory results on the NH-HAZE dataset, as shown in Figure 2. It is noteworthy that our proposed method exhibits strong performance across all datasets, underscoring the robustness of our model. Our dehazed images, as shown in Fig. 2, are not only visually appealing, but also closely resemble the ground truths in both tasks.

Fig. 3
figure 3

Qualitative results of our method with other state-of-the-art methods on ECSSD dataset [44]

Fig. 4
figure 4

Failure cases of salient object detection task on ECSSD dataset are given in a, b, c and haze removal task on NH-Haze are given in df. a Image, b ground truth, c our result, d hazy image, e clear image, f our result

Quantitative results comparison: Our quantitative experimental results are displayed in Tables 1 and 2. Across two real-world datasets, our method demonstrates exceptional performance, achieving the highest scores in terms of both PSNR (peak signal-to-noise ratio) and SSIM (structural similarity index) [55]. For PSNR, our method achieved a score of 21.6% and for SSIM, it achieved a score of 22.6%. Achieving success on large-scale benchmarks often necessitates intricate network design. To calculate the average percentage change in our method, we compare the performance metrics (PSNR, SSIM) of our method against the baseline for each instance. We calculated the percentage change for each instance by comparing the new values to the baseline values. After obtaining these percentage changes, we summed them up and divided by the total number of instances. This gave us the average percentage change, which reflects the overall improvement of our method compared to the baseline. However, when compared to DW-GAN, DehazeSOD does not outperform it in certain metrics. DW-GAN still leads in terms of PSNR and SSIM, setting a high benchmark for performance. Despite this, DehazeSOD significantly surpasses other methods in the field, achieving notable improvements over techniques such as DCP, AOD-Net, and FFA. This highlights the robustness and effectiveness of DehazeSOD in image dehazing and salient object detection tasks as shown in Tables 1 and 2.

We evaluated the processing time of our approach compared to state-of-the-art (SOTA) methods for analyzing a 1600 \(\times\) 1200 image using an NVIDIA 1080Ti GPU. As shown in Table 3, AOD-Net and DCP complete the dehazing process more quickly. However, these two methods fall short in effectively removing haze. Importantly, our proposed method outperforms DCP, GCANet, FFA, DW-GAN, and TDN in terms of running time.

5 Ablation study

Three distinct networks were constructed by selectively combining the modules to illustrate their individual and combined importance:

  • Haze removal module only: Demonstrates the performance of the network when solely focused on haze removal. This module is responsible for removing haze from images, which is crucial for improving image clarity and quality. The use of this module alone shows a PSNR of 21.68 and an SSIM of 0.661, indicating its effectiveness in haze removal.

  • Salient object detection module only: Shows the network’s ability to detect salient objects without additional enhancements. This module focuses on identifying and segmenting salient objects within the image. When used alone, it achieves a PSNR of 20.13 and an SSIM of 0.683, reflecting its impact on object detection and its relevance to overall image quality.

  • Attention module only: Highlights the impact of the attention mechanism on refining salient features. This module helps in refining the focus on significant regions in the image, enhancing both the detection and quality of salient objects. The attention module alone shows a PSNR of 20.68 and an SSIM of 0.675, highlighting its role in improving image feature recognition.

Finally, the combination of all modules into a single network yields the best performance, with a PSNR of 22.10 and an SSIM of 0.695, underscoring the importance of integrating all modules for optimal results and are presented in Table 4.

Table 4 Ablation study for architecture and loss functions: Ladv represents the generator loss, L2 denotes the mean squared error (MSE) loss, and \(\mathcal {L}_{\text {D}}\) indicates the discriminator loss

6 Conclusion

In conclusion, this study introduces DehazeSOD, a novel method that addresses the dual challenges of dehazing and salient object detection in images. DehazeSOD integrates two distinct subnetworks: one for image dehazing and an encoder–decoder framework for salient object detection. The dehazing subnetwork utilizes a combination of residual blocks, Dark Channel Prior, total variation, and the multiscale retinex algorithm, while the second subnetwork utilizes an EfficientNet architecture enhanced with attention mechanisms and pixel-wise refinement. However, our objective is to construct a model that strikes an optimal balance between robust mapping capabilities and the avoidance of overfitting. Table 3 presents the processing time for a single image. The processing time for predicting a salient object, which includes haze removal, is approximately 0.045 s with our trained module. The cGAN’s discriminator effectively distinguishes between generated haze-free images and real clear images. Our proposed DehazeSOD approach demonstrates outstanding performance in color fidelity, visibility enhancement, and haze removal compared to existing dehazing methods. This is validated by standard evaluation metrics like PSNR, SSIM, and MAE, surpassing the current state of the art (as shown in Table 1). Our experimental results show significant improvements, with an average increase in PSNR values of 21.6% on the NTIRE19 dataset and 30.3% on the NTIRE20 dataset, and an average increase in SSIM values of 22.6% and 14.16% on the NTIRE19 and NTIRE20 datasets, respectively. Additionally, the proposed method achieves a lower mean absolute error (MAE) compared to other state-of-the-art methods. These results suggest that DehazeSOD not only competes with, but also outperforms existing methods, potentially establishing a new benchmark in image dehazing and salient object detection. Consequently, it holds promise as a valuable tool for enhancing image quality in hazy conditions. Furthermore, comprehensive experimental results demonstrate DehazeSOD’s exceptional performance in real-world scenarios with dense and non-homogeneous haze. In our future work, we aim to broaden the datasets to encompass a wider range of real-world applications and further enhance the effectiveness of the proposed method by refining network structures.

Data availability

Data will be made available upon request.

References

  1. R. Fattal, Dehazing using color-lines. ACM Trans. Graphics (TOG) 34(1), 1–14 (2014)

    Article  Google Scholar 

  2. K. He, J. Sun, X. Tang, Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 33(12), 2341–2353 (2010)

    Google Scholar 

  3. Q. Zhu, J. Mai, L. Shao, A fast single image haze removal algorithm using color attenuation prior. IEEE Trans. Image Process. 24(11), 3522–3533 (2015)

    Article  MathSciNet  Google Scholar 

  4. A. Kumar, R.K. Jha, N.K. Nishchal, An improved gamma correction model for image dehazing in a multi-exposure fusion framework. J. Vis. Commun. Image Represent. 78, 103–136 (2021)

    Article  Google Scholar 

  5. A. Mehra, P. Narang, M. Mandal, Theianet: towards fast and inexpensive cnn design choices for image dehazing. J. Vis. Commun. Image Represent. 77, 103–148 (2021)

    Article  Google Scholar 

  6. B. Cai, X. Xu, K. Jia, C. Qing, D. Tao, Dehazenet: an end-to-end system for single image haze removal. IEEE Trans. Image Process. 25(11), 5187–5198 (2016)

    Article  MathSciNet  Google Scholar 

  7. B. Li, X. Peng, Z. Wang, J. Xu, D. Feng, Aod-net: All-in-one dehazing network. in Proceedings of the IEEE international conference on computer vision, p. 4770–4778 (2017)

  8. W. Ren, S. Liu, H. Zhang, J. Pan, X. Cao, M.-H. Yang, Single image dehazing via multi-scale convolutional neural networks. Computer Vision-ECCV, 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part II 14. Springer 2016, p. 154–169 (2016)

  9. H. Zhang, V. M. Patel, Densely connected pyramid dehazing network. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3194–3203 (2018)

  10. D. Berman, S. Avidan et al., Non-local image dehazing, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1674–1682

  11. X. Qin, Z. Wang, Y. Bai, X. Xie, H. Jia, Ffa-net: feature fusion attention network for single image dehazing. in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 07, pp. 11908–11915 (2020)

  12. W. Ren, J. Pan, H. Zhang, X. Cao, M.-H. Yang, Single image dehazing via multi-scale convolutional neural networks with holistic edges. Int. J. Comput. Vis. 128, 240–259 (2020)

    Article  Google Scholar 

  13. T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid networks for object detection. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125 (2017)

  14. C. Wang, Y. Zou, Z. Chen, Abc-net: avoiding blocking effect & color shift network for single image dehazing via restraining transmission bias. in 2020 IEEE International Conference on Image Processing (ICIP). IEEE, pp. 1053–1057 (2020)

  15. R. Liu, X. Fan, M. Hou, Z. Jiang, Z. Luo, L. Zhang, Learning aggregated transmission propagation networks for haze removal and beyond. IEEE Trans. Neural Netw Learn Syst. 30(10), 2973–2986 (2018)

    Article  Google Scholar 

  16. W.-T. Chen, H.-Y. Fang, J.-J. Ding, S.-Y. Kuo, Pmhld: patch map-based hybrid learning dehazenet for single image haze removal. IEEE Trans. Image Process. 29, 6773–6788 (2020)

    Article  Google Scholar 

  17. J. Zhang, D. Tao, Famed-net: a fast and accurate multi-scale end-to-end dehazing network. IEEE Trans. Image Process. 29, 72–84 (2019)

    Article  MathSciNet  Google Scholar 

  18. J. Dong, J. Pan, Physics-based feature dehazing networks, in Computer Vision-ECCV, 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXX 16. Springer 2020, 188–204 (2020)

  19. W. Ren, L. Ma, J. Zhang, J. Pan, X. Cao, W. Liu, M.-H. Yang, Gated fusion network for single image dehazing. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3253–3261 (2018)

  20. D. Chen, M. He, Q. Fan, J. Liao, L. Zhang, D. Hou, L. Yuan, G. Hua, Gated context aggregation network for image dehazing and deraining. in IEEE winter conference on applications of computer vision (WACV). IEEE 2019, 1375–1383 (2019)

  21. H. Wu, Y. Qu, S. Lin, J. Zhou, R. Qiao, Z. Zhang, Y. Xie, L. Ma, Contrastive learning for compact single image dehazing. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10 551–10 560 (2021)

  22. H. Dong, J. Pan, L. Xiang, Z. Hu, X. Zhang, F. Wang, M.-H. Yang. Multi-scale boosted dehazing network with dense feature fusion. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2157–2167 (2020)

  23. L.-A. Tran, S. Moon, D.-C. Park, A novel encoder–decoder network with guided transmission map for single image dehazing. Procedia Comput. Sci. 204, 682–689 (2022)

    Article  Google Scholar 

  24. D. Engin, A. Genç, H. Kemal Ekenel, Cycle-dehaze: enhanced cyclegan for single image dehazing. in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 825–833 (2018)

  25. J.-Y. Zhu, T. Park, P. Isola, A. A. Efros, Unpaired image-to-image translation using cycle-consistent adversarial networks. in Proceedings of the IEEE international conference on computer vision, pp. 2223–2232 (2017)

  26. A. Singh, A. Bhave, D.K. Prasad, Single image dehazing for a variety of haze scenarios using back projected pyramid network. in Computer Vision-ECCV, Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16. Springer 2020, 166–181 (2020)

  27. J. Pan, J. Dong, Y. Liu, J. Zhang, J. Ren, J. Tang, Y.-W. Tai, M.-H. Yang, Physics-based generative adversarial models for image restoration and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 43(7), 2449–2462 (2020)

    Article  Google Scholar 

  28. Q. Deng, Z. Huang, C.-C. Tsai, C.-W. Lin. Hardgan: a haze-aware representation distillation gan for single image dehazing. in European conference on computer vision. Springer, pp. 722–738 (2020)

  29. P. Sharma, P. Jain, A. Sur, Scale-aware conditional generative adversarial network for image dehazing. in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2355–2365 (2020)

  30. P. Isola, J.-Y. Zhu, T. Zhou, A. A. Efros, Image-to-image translation with conditional adversarial networks. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134 (2017)

  31. A. Kumar, R.K. Jha, N.K. Nishchal, A multi-exposure fusion framework for contrast enhancement of hazy images employing dynamic stochastic resonance. J. Vis. Commun. Image Represent. 81, 103–388 (2021)

    Article  Google Scholar 

  32. A. Kumar, R.K. Jha, N.K. Nishchal, Joint gamma correction and multi-resolution fusion scheme for enhancing haze degraded images. Opt. Eng. 60(6), 063–103 (2021)

    Article  Google Scholar 

  33. G.H. Babu, N. Venkatram, A survey on analysis and implementation of state-of-the-art haze removal techniques. J. Vis. Commun. Image Represent. 72, 102–927 (2020)

    Google Scholar 

  34. R. Chouhan, R.K. Jha, P.K. Biswas, Enhancement of dark and low-contrast images using dynamic stochastic resonance. IET Image Proc. 7(2), 174–184 (2013)

    Article  MathSciNet  Google Scholar 

  35. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)

  36. Z.-u. Rahman, D. J. Jobson, G. A. Woodell, Multiscale retinex for color rendition and dynamic range compression. in Applications of Digital Image Processing XIX, vol. 2847. SPIE, pp. 183–191 (1996)

  37. B. Koonce, B. Koonce, Efficientnet. Convolutional Neural Networks with Swift for Tensorflow: Image Recognition and Dataset Categorization, pp. 109–123, (2021)

  38. O. Ronneberger, P. Fischer, T. Brox. U-net: Convolutional networks for biomedical image segmentation. in Medical Image Computing and Computer-Assisted Intervention-MICCAI, 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18. Springer 2015, 234–241 (2015)

  39. G. Dhara, R.K. Kumar, Spatial attention guided cgan for improved salient object detection. Front. Comput. Sci. 6, 1420965 (2024)

    Article  Google Scholar 

  40. G. Dhara, R.K. Kumar, Deepfusion-net: a u-net and cgan-based approach for salient object detection, in International Conference on Frontiers in Computing and Systems. (Springer, Berlin, 2023), pp.427–442

    Google Scholar 

  41. C.O. Ancuti, C. Ancuti, F.-A. Vasluianu, R. Timofte, Ntire, challenge on nonhomogeneous dehazing. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops 2020, 490–491 (2020)

    Google Scholar 

  42. J. Shi, Q. Yan, L. Xu, J. Jia, Hierarchical image saliency detection on extended cssd. IEEE Trans. Pattern Anal. Mach. Intell. 38(4), 717–729 (2015)

    Article  Google Scholar 

  43. C.O. Ancuti, C. Ancuti, M. Sbert, R. Timofte. Dense-haze: a benchmark for image dehazing with dense-haze and haze-free images. in IEEE international conference on image processing (ICIP). IEEE 2019, 1014–1018 (2019)

  44. Q. Zhao, H. Wang, J. Dang, S. Li, R. Chang, Y. Fang, Z. Zhang, J. Peng, Y. Yang, Multistrengthening module-based salient object detection. Math. Probl. Eng. 2021, 1–12 (2021)

    Google Scholar 

  45. M. Fu, H. Liu, Y. Yu, J. Chen, K. Wang. Dw-gan: a discrete wavelet transform gan for nonhomogeneous dehazing. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 203–212 (2021)

  46. J. Liu, H. Wu, Y. Xie, Y. Qu, L. Ma, Trident dehazing network, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 430–431 (2020)

  47. X. Qin, Z. Zhang, C. Huang, C. Gao, M. Dehghan, M. Jagersand, Basnet: Boundary-aware salient object detection. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7479–7489 (2019)

  48. J.-X. Zhao, J.-J. Liu, D.-P. Fan, Y. Cao, J. Yang, M.-M. Cheng. Egnet: Edge guidance network for salient object detection. in Proceedings of the IEEE/CVF international conference on computer vision, pp. 8779–8788 (2019)

  49. M. Feng, H. Lu, E. Ding, Attentive feedback network for boundary-aware salient object detection. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1623–1632 (2019)

  50. J. Wei, S. Wang, Q. Huang, F\(^3\)net: fusion, feedback and focus for salient object detection. in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 07, pp. 12 321–12 328 (2020)

  51. M. A. Islam, M. Rochan, S. Naha, N. D. Bruce, Y. Wang, Gated feedback refinement network for coarse-to-fine dense semantic image labeling. arXiv preprint arXiv:1806.11266, (2018)

  52. J. Wei, S. Wang, Z. Wu, C. Su, Q. Huang, Q. Tian. Label decoupling framework for salient object detection. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13 025–13 034 (2020)

  53. Y. Pang, X. Zhao, L. Zhang, H. Lu. Multi-scale interactive network for salient object detection. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9413–9422 (2020)

  54. J.-J. Liu, Q. Hou, M.-M. Cheng, J. Feng, J. Jiang. A simple pooling-based design for real-time salient object detection. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3917–3926 (2019)

  55. Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

Download references

Acknowledgements

We thank SRM University-AP for providing facilities for the completion of this research work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ravi Kant Kumar.

Ethics declarations

Competing interests

The author declares that there are none of the conflicts.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dhara, G., Kumar, R.K. A novel multiscale cGAN approach for enhanced salient object detection in single haze images. J Image Video Proc. 2024, 30 (2024). https://doi.org/10.1186/s13640-024-00648-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13640-024-00648-x