Research on application of multimedia image processing technology based on wavelet transform
EURASIP Journal on Image and Video Processing volume 2019, Article number: 24 (2019)
With the development of information technology, multimedia has become a common information storage technology. The original information query technology has been difficult to adapt to the development of this new technology, so in order to be able to retrieve useful information in a large amount of multimedia information which has become a hot topic in the development of search technology, this paper takes the image in the multimedia information storage technology as the research object, uses the wavelet transform to divide the picture into the advantages of the low-frequency and high-frequency characteristics, and establishes the multimedia processing technology model based on the wavelet transform. The simulation results of face, vehicle, building, and landscape images show that different wavelet basis functions and different layers of images are decomposed, and the retrieval results and retrieval speed of images are different, When taking four layers of wavelet decomposition, the cubic b-spline wavelet as the wavelet basis function makes the classification result optimal, and the accuracy rate is 89.08%.
Multimedia generally refers to images, graphics, texts, and sounds. As an important information carrier, images have features such as intuitive images and rich content. They are an important way of expressing information. Image processing technology has become a major content of multimedia processing technology. Especially with the development of multimedia technology and the arrival of the information age, people are increasingly exposed to a large number of image information. How to effectively organize, manage, and retrieve large-scale image databases has become an urgent problem to be solved.
In the research of multimedia image retrieval technology, from the aspect of feature representation, it can basically be divided into three directions: (1) Based on the color features of the image, the color feature is the most widely used visual feature in image retrieval and is the most intuitive, and most obviously, it is one of the most important perceptual features of image vision. The main reason is that the color is often very much related to the objects or scenes contained in the image. In addition, compared with other visual features, the color feature has less dependence on the size, orientation, and viewing angle of the image itself, so that it has higher stability and higher robustness, and the calculation is simple, so it is widely used at present. Users can input the color features that they want to query and match the information in the color feature library. The color-based feature extraction method can better represent the color information of the image. At present, the methods of color feature extraction mainly include color histogram [1, 2], color moments [3, 4], color sets , color coherence vector [6, 7], and color correlogram . (2) Based on the retrieval of image texture features, texture features are visual features that reflect the homogeneity of the image independently of color or brightness. It is a common intrinsic feature of all surfaces. Texture features contain important information about the structure and arrangement of the surface and their relationship with the surrounding environment. Because of this, texture features are widely used in content-based image retrieval, and users can find other images that contain similar textures by submitting images that contain some kind of texture. In the texture feature retrieval method, co-occurrence matrix [9,10,11] and Gabor filter [12,13,14] are two commonly used methods. (3) Based on the retrieval of the image shape feature, the shape information of the image does not vary with the color and other characteristics of the image, so it is a stable feature of object. Especially for graphics, shape is its only important feature. In general, there are two kinds of representations of shape features, one is a contour feature and the other is a region feature. The former only uses the outer boundary of the object, while the latter relates to the entire shape area. The most typical methods for these two types of shape features are Fourier shape descriptions [15, 16] and moment invariants [17, 18].
The concept of wavelet transform was first proposed by J. Morlet, an engineer engaged in petroleum signal processing in France, in 1974. The inversion formula was established through practical experience of physical intuition and signal processing. The essential difference between wavelet analysis and Fourier analysis is that Fourier analysis only considers the one-to-one mapping between the time domain and frequency domain. It uses the function of a single variable (time or frequency) to represent the signal, while the wavelet analysis uses the joint time scale function to analyze the non-stationary signal. The difference between wavelet analysis and time-frequency analysis is that time-frequency analysis represents a non-stationary signal in the time-frequency plane. Wavelet analysis describes that the non-stationary signal is also in the two-dimensional plane; it is not on the time-frequency plane but on the so-called time scale plane. In the short-time Fourier transformation, the signal is observed at the same resolution (that is, a uniform window function), and in the wavelet analysis, the signal is observed at different scales or resolution. This multi-scale or multi-resolution view in signal analysis is the basic point of wavelet analysis.
The basic idea of wavelet analysis is derived from Fourier analysis, which is a breakthrough development of Fourier analysis. It is not only a powerful analytical technique, but also a fast calculation tool, which has both important theoretical significance and practical value. Wavelet analysis is a powerful tool for characterizing the internal correlation of signal data and has powerful power in data compression and numerical approximation. Due to its “self-adaptive” and “mathematical microscope properties,” it has become the focus of much attention in many disciplines.
In the research of pattern recognition, wavelet analysis can be used to decompose the low frequency and high frequency of the frequency to represent the characteristics of the signal. It is widely used in many fields of signal analysis. Wavelet analysis is a frequency analysis method that has been widely used in many fields for feature analysis [19,20,21,22,23].
Wavelet analysis is also often used to analyze image features during image analysis. In the research of Li et al. , the fusion of multi-sensor images is realized by wavelet transform. The goal of image fusion is to integrate supplementary information from multi-sensor data, making the new image more suitable for human visual perception and computer processing and for the purpose of tasks such as segmentation, feature extraction, and object recognition. The proposed scheme performs better than the Laplacian-based approach. It is recommended to use a specially generated test image for performance measurement and to evaluate different fusion methods and compare the advantages of different wavelet transforms with ker Nelsons’ extensive experimental results. Chang and Kuo  used the advantage of wavelet transform to propose a multi-resolution method based on improved wavelet transform, called tree structure wavelet transform or wavelet packet. The development of this transformation is that a large class of natural textures can be modeled as quasi-periodic signals with a dominant frequency in the intermediate frequency channel. The transform can be scaled up to any desired frequency channel for further decomposition. In contrast, conventional pyramid structure wavelet transforms perform further decomposition in the low-frequency channel. A progressive texture classification algorithm has been developed, which not only has computational appeal, but also has excellent performance.
In this paper, the multimedia retrieval technology is studied with the image as the research object, as well as the wavelet decomposition of images, extraction of image features, comparison of the effect of different wavelet bases on the recognition results, and the effect of different decomposition layers on the recognition results in the retrieval and analysis process. Through the recognition results of face, vehicle, building, and landscape images, the optimal wavelet basis function and the optimal number of layers are selected, and an image retrieval model based on wavelet decomposition is established.
The contributions of this article are as follows:
Design an image retrieval method based on wavelet decomposition
Analyze the influence of different wavelet bases on image retrieval and obtain the optimal wavelet base for wavelet decomposition
Analyze the influence of different layer decomposition on image retrieval and get the optimal layer
2 Proposed method
2.1 Wavelet theory
The basic idea of wavelet analysis originates from the Fourier analysis, which is a breakthrough development of analysis. It is not only a powerful analytical technique but also a fast computing tool. The multi-resolution analysis in wavelet theory provides an effective way to describe and analyze signals with different resolution and approximation accuracy. It is highly valued in image processing and applications. The wavelet transform method can be expressed as follows:
where ψ(t) is the mother wavelet, a is the scale factor, and τ is the translation factor.
In the past decade, wavelet analysis has made rapid progress in both theory and method. People study from three different starting points: multi-resolution, framework, and filter bank. At present, the description of function space, construction of wavelet basis, cardinal interpolation wavelet, vector wavelet, high-dimensional wavelet, multi-band wavelet, and periodic wavelet are the main research directions and hotspots of wavelet theory. Nowadays, people have recognized multi-resolution processing in computer vision, subband coding in speech and image compression, non-stationary signal analysis based on non-uniform sampling grids, and wavelet series expansion in applied mathematics are only the same theory. That is, different views of wavelet theory.
In application, wavelet analysis has quite an extensive application space due to its good time-frequency localization characteristics, scale variation characteristics, and directional characteristics. Its application areas include many disciplines of mathematics, quantum mechanics, theoretical physics, signal analysis and processing, image processing, pattern recognition and artificial intelligence, machine vision, data compression, nonlinear analysis, automatic control, computational mathematics, artificial synthesis of music and language, medical imaging and diagnosis, geological exploration data processing, fault diagnosis of large-scale machinery, and many other aspects. The scope of its application is constantly expanding. Wavelet analysis is used as an important analytical theory and tool in almost all subject areas, and fruitful results have been achieved in the research and application process.
Let ψ(t) ∈ L2(R), if the Fourier transform of ψ(t) satisfies the following conditions:
Then, ψ(t) is called the mother wavelet. The mother wavelet is translated and expanded to form a family of functions.
The continuous wavelet transform of the function f(t) is defined as:
2.2 Wavelet basis
French scholar Daubechies proposed a class of wavelets with the following characteristics, called the Daubechies wavelet.
Finite support in time domain, that is, the length of ψ(t) is finite and its high-order origin ∫tpψ(t)dt = 0, p = 0 ∼ N. The longer the N value, the longer the length of ψ(t).
In the frequency domain, ψ(ω) has a N zero point at ω.
ψ(t) and its integer displacement are orthogonal.
2.3 Color characteristics of the image
Color features are the most widely used visual features in image retrieval. Colors allow the human brain to distinguish between objects’ brightness and boundaries. In image processing, color is based on well-established descriptions and models. Each system has its own characteristics and scope of use. When processing images, color systems can be determined according to requirements and can be used in different color systems. A color feature is a global feature that describes the surface properties of a scene corresponding to an image or image area. The general color feature is based on the characteristics of the pixel, at which point all pixels belonging to the image or image area have a white contribution. The color is often related to the background of the object in the image, and compared with other visual features, the color feature has less dependence on the size, direction, and viewing angle of the image itself and thus has higher robustness.
Since the color is insensitive to changes in the direction, size, etc. of the image or image area, the color feature does not capture the local features of the object in the image well. In addition, when only the color feature is used, if the database is very human, many unneeded images are often retrieved. Color histograms are the most commonly used methods for expressing color features. They have the advantage of being unaffected by image rotation and translation changes. Further, normalization is not affected by image scale changes. The disadvantage is that color space distribution is not expressed. Color histograms are color features that are commonly used in many image retrieval systems. It describes the proportion of different colors in the entire image and does not care about the spatial position of each color, that is, the object or object in the image cannot be described. Color histograms are particularly well suited for describing images that are difficult to white-divide.
2.4 Image texture features
The so-called image texture reflects a local structural feature of the image, which is expressed as a certain change in the gray level or color of the pixel in a neighborhood of the image pixel, and the change is spatially statistically related. The arrangement of texture primitives and primitives consists of two elements. Texture analysis methods include statistical methods, structural methods, and model-based methods.
A texture feature is also a global feature that also describes the surface properties of a scene corresponding to an image or image region. However, since the texture is only a characteristic of the surface of the object and does not fully reflect the essential properties of the object, high-level image content cannot be obtained by only using the texture feature. Unlike color features, texture features are not pixel-based features, and they require statistical calculations in regions that contain multiple pixels. In pattern matching, this regional feature has greater advantages and cannot be successfully matched due to local deviations. As a statistical feature, texture features often have rotational invariance and are more resistant to noise. However, texture features also have their disadvantages. One obvious drawback is that when the resolution of the image changes, the calculated texture may have a large deviation. In addition, due to the possibility of being affected by illumination and reflection, the texture reflected from the image is not necessarily the actual texture of the surface of the object, for example, reflections in water. The effects of reflections from smooth metal surfaces, etc., can cause texture changes. Since these are not the characteristics of the object itself, when applying texture information to a search, sometimes these fake textures can be “misleading” to the search.
The use of texture features is an effective method when searching for texture images that have large differences in thickness, density, and the like. However, when there is little difference between the easily distinguishable information such as the thickness and the density between the textures, the usual texture features are difficult to accurately reflect the difference between the textures of different human visual perceptions.
3 Experimental results
3.1 Data sources
This data is based on the face database cas-Peal of the Institute of Technology of the Chinese Academy of Sciences. The database was built in 2003, including 1040 face samples. The face image of the database is complex, including faces with different positions, such as front and side, and face samples with different time periods. To meet the requirements of sample diversity, the database includes samples of men and women of different ages, and the images include a variety of backgrounds.
In order to verify the correctness and robustness of this method, the second data in this paper comes from life, using life pictures taken by MI 4, including vehicle, building, and landscape, 200 photos of each type, the picture size is 92 × 112, and the picture is converted to BMP (Bitmap) format using JPEG (Joint Photographic experts group).
3.2 Experimental environment
The data processing in this paper is performed in MATLAB R2014b 8.4 software environment. The main parameters of the hardware environment are Intel Core i7-4710HQ quad-core processor, Kingston DDR3L 4G memory, and Windows 7 Ultimate 64-bit SP1 operating system.
3.3 Classification method
In order to guarantee the stability of the classification, this paper uses a Support Vector Machine (SVM) as a classifier that uses a linear kernel function, in which the test set and training set of the sample are divided by a 10-fold cross-validation method, and the sample is divided into 10 samples, one of which was used as a test sample and nine were taken as training samples.
4.1 Image preprocessing
The wavelet transform divides the image into high frequency and low frequency. The low frequency includes the frame part of the original image, and the high frequency preserves the detail part of the image. Therefore, the main features of multimedia retrieval exist in the low frequency part. Figure 1 shows the comparison of the original picture and the low frequency part after wavelet transform.
As can be clearly seen from Fig. 1, after the wavelet transform, the picture becomes blurred, but the basic features of the face, such as the eyes, mouth, nose, cheeks, eyebrows, and other features are still very clear. The result of the blurred picture shows that the number of feature tables is few, and the basic picture features clearly show that although the feature is reduced, it does not affect the feature extraction.
4.2 The influence of the wavelet parameters on the classification
The factors affecting the recognition result and efficiency are mainly the wavelet basis and wavelet layer number. The choice of wavelet basis directly affects the quality of feature extraction and affects the final retrieval rate. The number of wavelet layers determines the number of features in the recognition. The higher the number of layers, the more features of the image. This paper compares the effects of five kinds of wavelet bases such as Daub(2), Daub(4), Daub(6), cubic b-spline wavelet, and orthogonal base wavelet on the recognition results. At the same time, the effects of 1-, 2-, 3-, 4-, and 5-layer wavelet transform on classification efficiency are compared.
The wavelet decomposition process can decompose the image into different frequencies, and the decomposition ability of the different layers of wavelet decomposition to the image is different. The following shows three layers of wavelet decomposition, using DB2 as the wavelet basis. The image is a building in life, as shown in Fig. 2.
The wavelet decomposition is performed on Fig. 2, as shown in Fig. 3. It can be seen from the results of Fig. 3 that the information of the picture is mainly concentrated in the low frequency part, and the information in the high frequency part is very small.
Refactoring the decomposed image, Fig. 4 is the result of the two-layer reconstruction of the above image, and Fig. 5 is the result of the three-layer reconstruction of the above decomposition. The image displayed in the left to right three pictures is the result of the reconstruction after the high-frequency part of the layer is abandoned. From the results, no matter whether it is three or four layers, the reconstructed image results have no obvious tendency to worsen.
Figure 6 shows the influence of different wavelet and wavelet transforms of different layers on the face recognition standard database. It can be seen from the results of Fig. 6 that the different choices of wavelet layer and wavelet basis will affect the results of wavelet decomposition. For Daub(2), the wavelet layer has little effect on the accuracy rate and reaches the maximum when the Daub(4) wavelet layer is at 2, but as the number of layers increases, the accuracy rate slowly decreases. When Daub(6), cubic b-spline wavelet and orthogonal base wavelet are used as wavelet basis functions, the wavelet layer has the greatest impact, and when the number of layers is four, the classification result is optimal.
The result can be obtained from Fig. 6. When the cubic b-spline wavelet is used as the wavelet basis function and the four-layer wavelet decomposition is used as the layer number, the classification effect is optimal and the accuracy rate is 89.08%.
Taking the cubic b-spline wavelet as the wavelet basis function, the effects of different layers on image recognition efficiency are analyzed for the five-layer wavelet decomposition. Table 1 shows the average time for image retrieval in different layers. The results in Table 1 show that as the number of wavelet layers increases, the retrieval time for each image will gradually increase, but the time between levels 1 and 2 to levels 3 and 4 will increase significantly, and the search time for level 5 will increase significantly for the second time. Based on the recognition rate of Fig. 1, it can be seen that the recognition rate of the three- and four-layer wavelet decomposition is higher than 1, 2, and 5. The results can show that although the decomposition rate of the one and two layers is faster, but the partial feature is lost, the recognition effect is not good, and the five-layer wavelet decomposition causes too many redundant features and affects the recognition effect.
4.3 Recognition results of non-face images
Results 1 and 2 show that the highest recognition efficiency of the standard face database is the four-layer wavelet transform with the cubic b-spline wavelet as the wavelet basis function. The results of the image recognition of the vehicle, building, and landscape based on this result are shown in Table 2.
The results in Table 2 show that using the four-layer wavelet analysis and the cubic b-spline wavelet as the wavelet basis, the recognition rates of the three types of pictures are all higher. Among them, the recognition rate of buildings and vehicle is higher than that of landscape. The reason may be that the frequency characteristics of vehicle and buildings are obvious, but the frequency characteristics of landscape are not obvious. From the training time and retrieval time of the same sample number, although the retrieval accuracy rate of landscape is the lowest, the landscape consumes the most time. The training time is 2.7 times and 2.4 times that of the vehicle and the building respectively, and the variance of the retrieval time is also relatively large, which shows that the method has a poor effect on the feature extraction of the landscape images.
4.4 Low-frequency and high-frequency recognition rate
The wavelet analysis divides the original signal into high-frequency and low-frequency parts. High frequency describes the detail of the wavelet in this layer. The low frequency description is a general situation. The above features in this paper are all comprehensive features. The reduction of the characteristic number is the best means to improve the recognition time. Figure 7 shows the recognition results of four kinds of images in the data source, and as can be seen in the results of Fig. 7, the recognition rate of low-frequency features is far higher than that of high-frequency features, reaching 86.99%, 91.41%, 89.75%, and 75%, respectively. It shows that the recognition effect of low-frequency features achieves the effect of mixed features, while the recognition rate of high-frequency features is lower than the recognition rate of mixed features. The results show that high-frequency features are redundant features in multimedia recognition in wavelet analysis.
Multimedia resources have become a way for people to obtain information. Intelligent query of multimedia information is a new hotspot of data mining technology. In the query of multimedia information, the query algorithm design is one of the main aspects. Although the wavelet transform has been successfully used and image research, the optimal selection problem between the number of layers in the wavelet transform and the wavelet basis function has not been solved in the image retrieval process. In this paper, wavelet analysis is used as an image feature query method to analyze face, vehicle, building, and landscape images. The wavelet bases on different wavelet basis functions and the number of decomposition layers are analyzed, and the accuracy and query speed are used as evaluation indicators, and the effects of different wavelet basis functions and layers on the results are compared and analyzed.
Joint Photographic experts group
Support Vector Machine
S. Ahmadi, A. Manivarnosfaderani, B. Habibi, Motor oil classification using color histograms and pattern recognition techniques. J. AOAC Int. 101, 1967–1976 (2018)
Liu H, Zhao F, Chaudhary V. Pareto-based interval type-2 fuzzy c-means with multi-scale JND color histogram for image segmentation. Digital Signal Process. 76, 75-83 (2018)
L. Li, K. Liu, F. Cheng, An improved TLD with Harris corner and color moment. Proceedings of the Spie 225, 102251P (2017)
V. Vinayak, S. Jindal, CBIR system using color moment and color auto-Correlogram with block truncation coding. International Journal of Computer Applications 161(9), 1–7 (2017)
P.G.J. Barten, Effects of quantization and pixel structure on the image quality of color matrix displays. J. Soc. Inf. Disp. 1(2), 147–153 (2012)
I.M. Stephanakis, G.C. Anastassopoulos, L. Iliadis. A self-organizing feature map (SOFM) model based on aggregate-ordering of local color vectors according to block similarity measures. Neurocomputing 107, 97-107 (2013)
I.M. Stephanakis, G.C. Anastassopoulos, L.A. Iliadis, Self-organizing feature map (SOFM) model based on aggregate-ordering of local color vectors according to block similarity measures. Neurocomputing 107(4), 97–107 (2013)
D. Chai, K.N. Ngan, Face segmentation using skin-color map in videophone applications. IEEE Trans Csvt 9(4), 551–564 (1999)
I. Pantic, Z. Nesic, J.P. Pantic, et al., Fractal analysis and gray level co-occurrence matrix method for evaluation of reperfusion injury in kidney medulla. J. Theor. Biol. 397(2), 61–67 (2016)
I. Pantic, D. Dimitrijevic, D. Nesic, et al., Grey level co-occurrence matrix algorithm as pattern recognition biosensor for oxidopamine-induced changes in chromatin architecture. J. Theor. Biol. 406, 124–128 (2016)
I. Pantic, D. Dimitrijevic, D. Nesic, et al., Gray level co-occurrence matrix algorithm as pattern recognition biosensor for oxidopamine-induced changes in lymphocyte chromatin architecture. J. Theor. Biol. 406, 124–128 (2016)
A.K. Jain, F. Farrokhnia, et al., Unsupervised texture segmentation using Gabor filters. Pattern Recogn. 24(12), 1167–1186 (1991)
Jain A K, Farrokhnia F. Unsupervised texture segmentation using Gabor filters[C]// IEEE International Conference on Systems, Man and Cybernetics, 1990. Conference proceedings. IEEE, 2002:1167–1186
S.E. Grigorescu, N. Petkov, P. Kruizinga, Comparison of texture features based on Gabor filters. IEEE transactions on image processing: a publication of the IEEE Signal Processing Society 11(10), 1160–1167 (2002)
Navarro-Alarcon D, Liu Y H. Fourier-based shape servoing: a new feedback method to actively deform soft objects into desired 2-D image contours. IEEE Trans. Robot., 2018, PP(99):1–8
H. Yun, B. Li, S. Zhang, Pixel-by-pixel absolute three-dimensional shape measurement with modified Fourier transform profilometry. Appl. Optics 56(5), 1472 (2017)
J. Flusser, T. Suk, Pattern recognition by affine moment invariant. Pattern Recogn. 26(1), 167–174 (1993)
D. Marin, A. Aquino, M.E. Gegundezarias, et al., A new supervised method for blood vessel segmentation in retinal images by using gray-level and moment invariants-based features. IEEE Trans. Med. Imaging 30(1), 146 (2011)
M.R. Banham, N.P. Galatsanos, H.L. Gonzalez, et al., Multichannel restoration of single channel images using a wavelet-based subband decomposition. IEEE Trans. Image Process. 3(6), 821–833 (2016)
H. Shao, X. Deng, F. Cui, Short-term wind speed forecasting using the wavelet decomposition and AdaBoost technique in wind farm of East China. Iet Generation Transmission & Distribution 10(11), 2585–2592 (2016)
Y. Guo, B.Z. Li, Blind image watermarking method based on linear canonical wavelet transform and QR decomposition. IET Image Process. 10(10), 773–786 (2016)
K.R. Singh, S. Chaudhury, Efficient technique for rice grain classification using back-propagation neural network and wavelet decomposition. IET Comput. Vis. 10(8), 780–787 (2017)
W.W. Boles, B. Boashash, A human identification technique using images of the iris and wavelet transform. IEEE Trans. Signal Process. 46(4), 1185–1188 (1998)
H. Li, B.S. Manjunath, S.K. Mitra, Multisensor image fusion using the wavelet transform. Graphical models and image processing 57(3), 235–245 (1995)
T. Chang, C.-C.J. Kuo, Texture analysis and classification with tree-structured wavelet transform. IEEE Trans. Image Process. 2(4), 429–441 (1993)
The authors thank the editor and anonymous reviewers for their helpful comments and valuable suggestions.
Availability of data and materials
Please contact author for data requests.
301 Art Center Chung-Ang University 221 Heukseok-dong Dongjak-gu, Seoul, 156–756 Korea.
Kun Sui was born in Qingdao, Shandong, P.R. China, in 1982. Doctor of Technology Art, Lecturer. Graduated from the Korea Dong Yang University in 2009. Worked in Qingdao Agricultural University. His research interests include New Media Art and digital image processing.
*Author for correspondence:
Hyung-Gi Kim, was born in Korea, in 1960.
Doctor of Technology Art, Professor. Graduated from the Soongsil University in 2009. Worked in Graduate school of Advanced Imaging Science, Multimedia and Film Chung-Ang University, Seoul, Korea. He has held eleven successful solo Media Art Exhibitions and participated in many group exhibitions. His research focuses on 3D display systems, projection mapping, kinetic art, interactive media art, and media performance.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Sui, K., Kim, HG. Research on application of multimedia image processing technology based on wavelet transform. J Image Video Proc. 2019, 24 (2019). https://doi.org/10.1186/s13640-018-0396-1