- Open Access
Automatic detection technology of sports athletes based on image recognition technology
© The Author(s). 2019
- Received: 28 October 2018
- Accepted: 7 January 2019
- Published: 18 January 2019
In order to improve the motion recognition effect of sports athletes based on image recognition technology, this study takes the current common diving athletes as the research material in the actual research, and combines the research status of image recognition to study the athlete’s motion recognition from image processing. Simultaneously, in this study, the gradient segmentation method is used to segment the image, the research object is segmented from the video image, the traditional image grayscale method is improved, and the image segmentation algorithm adapted to the diving motion is obtained. On this basis, this study combines Gaussian mixture background modeling and background subtraction to achieve the detection and extraction of target human body regions, and uses morphological operators to deal with noise and void phenomena in foreground images. The example analysis shows that the proposed method has certain practicality and can provide theoretical reference for subsequent related research.
- Image recognition
- Video image
In recent years, the application of computer vision technology in sports video has become a new hot spot. Sports competition is one of the most popular programs, so sports video has become an important type of media data, which has a large audience and a huge application prospect. Image recognition processing of sports video can obtain effective detection results, which has certain effects on athlete training and is conducive to the effective development of the game process. In recent years, researchers at home and abroad have done a lot of work on moving target detection and segmentation and human motion pose recognition. The research content in sports video target detection mainly includes sports field detection, athlete detection, and ball detection. The research scope of gesture recognition includes gait recognition, gesture recognition, and so on.
Target segmentation is a key step in the identification process, which usually means separating the target we are interested in and the background we are in as an independent part . Traditional segmentation methods include background subtraction, interframe difference and optical flow, as well as skin color detection for human detection. Among them, the background difference method is the simplest and direct method, which is to subtract the image to be segmented from the background model image, and the difference is the motion region. The core content of this method is the construction of the background model . Croston et al. established a statistical background model for each pixel in the picture . Walseth et al. combine pixel color and gradient information to create an adaptive background model . Lu builds an adaptive background model based on Kalman filtering . The inadequacy of the background difference method is that it is easily interfered by external conditions such as light and weather. Interframe difference is the separation of foreground and background by calculating the difference between pixels in the corresponding two or more frames . Young et al. applied the interframe difference method to detect players in the field during a football match. This method works well for video detection with background stillness, but it cannot accurately determine the target position, nor can it extract the complete target . The background difference method and the interframe difference method are not ideal when detecting the motion of the background motion. At the same time, the two methods in the diving game usually follow the player’s decline and move downwards, the background has a very large movement, so the segmentation effect using the above two methods is not very good . The optical flow method is to calculate the pixel area corresponding to the moving target model by calculating the optical flow field between adjacent frames, and then combine these areas to form a moving target, and the detected target area is usually unreliable . Hal takes advantage of the large differences in the direction of movement between the athletes and the background in the diving competition and the fact that most of the athlete’s body is exposed, combining the optical flow method with the skin color detection to segment the target .
In feature-based algorithms, certain features of the tracking target are used to distinguish between tracking targets and other objects in a frame of video. Some algorithms use the background image as a reference, i.e., the so-called background frame . All objects in the “difference frame” obtained by subtracting the background frame from the current frame are the calculated tracking target index . In order to identify the tracking target from other objects, the tracking target is characterized by the features of the tracking target. Parameterized shapes, color distributions, shapes, and colors in the tracking target representation can be used as features. The neural network classifier is trained with the feature and manually marked tracking targets, and then the trained neural network classifier is used to distinguish the tracking target from other objects . The color histogram in the elliptical area is used to track the pool of players on the court. These algorithms make more use of low-level image information, acquire features in a simple manner, and use a rough feature to describe the whole behavior, which is sensitive to noise, perspective changes, and subject changes in behavior .
In recent years, domestic and foreign scholars have done a lot of research on the challenging topic of human behavior analysis, recognition, and understanding . Based on the projection information of the moving target, Guo et al. combined the PCA algorithm to classify and recognize nine different postures such as standing, lying, sitting, and walking. These actions are based on the whole body, and the difference between the actions is large. CuCcbjara et al. use Bayesian classifiers to classify four distinct postures of contracture, sitting, standing, and lying. In addition, Hu Changbo et al. realized the identification of Yang’s six sets of Taijiquan movements through PCA modeling . Although these methods have a high recognition rate for the recognized motion postures, they are all carried out in a specific environment with a simple background, a camera still, and a slow-motion movement . Leonard et al. used the template matching method to analyze the athlete’s body posture in the diving video, but the template-based analysis has high requirements on the template library capacity and quality . Crance uses a multi-feature fusion method to extract multiple features from the segmented target image, and uses the SVM output probability method to identify and classify three aerial poses.
Through the above review, it is known that the current image detection and image recognition technologies have been developed and have been applied to many industries . This research is based on image recognition technology analysis, taking sports as the research object, identifying the sports process of sports athletes, performing image processing on sports videos, and obtaining effective information through image processing, thus further improving the efficiency of sports training and sports competition.
From the perspective of segmentation, the segmentation of the video object in space is the detection and segmentation of the moving target. Specifically, it refers to separating the independent regions of interest or meaning in the video sequence from the background. Target segmentation is the most basic part of video motion pose recognition. If the target can be correctly detected and segmented in each frame image, it provides a guarantee for the correct recognition of the pose. However, target detection is subject to many unknown factors, and in order to suppress these external interferences, there is often a real-time price.
The motion of the two-dimensional image is the projection of the three-dimensional velocity vector of the visible point in the scene on the imaging plane. An estimate of the instantaneous variation of a point in a sequence of sequential images is generally considered to be an optical flow field or a velocity field. Optical flow field calculation methods are generally divided into five types: gradient-based methods, energy-based methods, matching-based algorithms, phase-based methods, and neurodynamic methods. Among them, the gradient-based method uses the image gray value to calculate the optical flow field. It is assumed that the gray value before and after the moving image remains unchanged, and the optical flow constraint equation is derived, which is the most studied method. However, since the optical flow equation does not uniquely determine the optical flow, other constraints need to be introduced. According to the introduced constraints, the gradient-based methods can be divided into two categories: global constraint methods and local constraint methods. Typical algorithms include Hom-Schunck algorithm and Lucas-Kanade algorithm. In contrast, the Lucas-Kanade algorithm has improved a lot in accuracy and speed, and has strong anti-noise ability. The calculation method of the algorithm will be described in detail below.
The main goal of this paper is to identify the types of poses that athletes are doing in a complex environment. Therefore, in the representation of human motion state, this study extracted the key features of the overall shape and motion of the human body. However, whether it is based on the appearance of shape features alone or the use of motion features to characterize people’s motion state, there will be deficiencies. Therefore, this paper uses the idea of feature fusion to represent people’s sports postures with multiple feature fusions. The selected features will be described in detail below.
Among them, α is the update rate of the Gaussian distribution parameter, and the parameters remain unchanged for the Gaussian distribution with no matching success. In the establishment of the background model, we set the number of Gaussian distributions describing each pixel 3 = K. The background model is initialized first, and the initial weights are w1, 0 = 1, w2, 0 = 1, w3, 0 = 1. The pixels of the first frame are used to initialize the first Gaussian distribution mean, and the mean of the remaining Gaussian distribution is 0. The standard deviation of each model takes a larger value σi, 0 = 30, a weight update rate β = 0.33, a learning rate α = 0.7, and a threshold value of 7.0 = T. If no Gaussian distribution is found to match the xt at the time of detection, then a Gaussian distribution with the lowest priority is removed, and a new Gaussian distribution is introduced according to xt, and a smaller weight and a larger variance are assigned, and then weight normalization is performed.
In order to study the detection effect of image-based automatic detection technology for sports athletes, this paper takes Avi format video as sample data, all of which come from the live video of downloading 10 m platform male and female single diving competition from the Internet. We chose a 160-segment sequence, each of which is a complete diving process, usually between 3 and 6 s. We chose a 160-segment sequence, each of which is a complete diving process, usually between 3 and 6 s. At the same time, we take 3109 frames containing the desired motion in the diving as a sample, and the sample resolution is 480 × 360. During the diving game, the athlete usually completes some transitions and connections in addition to completing one of the three postures during the completion of a diving. The entire process lasts for a short time, usually 3–4 s, so the athletes spend less time completing one of the prescribed actions. In order to more accurately identify the actions performed by the diving process, this paper identifies each frame of the sample video.
Template matching result
SVM classifier recognition result
Recognition rate (%)
Recognition rate (%)
Bending the body
Holding the knee
In a diving game, the camera usually moves as the player descends, resulting in a global movement. In the segmentation stage, the optical flow method is used to estimate and eliminate the global motion region caused by camera motion, and then the skin color detection is used to determine the target position in the motion region. This method eliminates a large amount of noise while eliminating global motion, and reduces a lot of noise compared to the result of directly using skin color detection, as shown in Fig. 6. When the target position is directly used for skin color detection, due to the background is usually the audience in the diving competition venue, many colors such as the skin color of the audience, the color of the scene facilities, and the color of the clothes in the background area may be close to the skin color of the target human body, so that it may be mistaken for the target skin color and detected, as shown in Fig. 6b. Figure 6c shows the image after the global motion is removed using the optical flow method, and Fig. 6d shows the result of the skin color detection based on the removal of the global motion. It can be seen that the noise is much reduced compared to Fig. 6b.
After the image is processed by optical flow method and skin color detection, a series of connected regions are formed, and no noise regions are still unavoidable, as shown in Fig. 7. In order to eliminate the noise as much as possible, this paper uses the projection method to further eliminate the noise and segment the moving target as accurately and completely as possible. It can be seen from the figure that the projection method adopted in this paper makes full use of the characteristics of the athletes in the main position of the diving video, and preserves the information of the target while removing the noise. Conversely, if the noise is to be completely removed, the morphological method is likely to lose some of the target information while removing the noise. As shown in Fig. 7b, in order to completely filter out the noise in the image, a loss occurs in a part of the target human body. The reason is that athletes wear swimwear, and after skin color detection, because the skin color part of the swimsuit is not detected, a small connection is formed, and the morphological method mistakes these small connections as a boundary to disconnect them, resulting in loss of the target information. As shown in Fig. 7c, in order to preserve the target information as much as possible in the process of denoising, there is still noise, which will inevitably affect the subsequent recognition results. In Fig. 7d, the area of the connected region due to noise is large, and the information of the target has been lost when the result of noise removal is not obvious. In this way, when the noise is completely removed, the target information is definitely lost a lot, and it will definitely affect the result of the final recognition. According to the characteristics that the athlete is always in the middle of the screen, the noise reduction effect of the projection method in this paper is ideal. The noise is removed and the target is segmented without losing the target information, as shown in Fig. 7e.
It can be seen from Table 1 that the statistical-based machine learning method SVM has a good recognition rate, while the template-based method has poor recognition results for the three postures. The reason is that the SVM method can get the optimal solution under the limited sample information, and can get the global best advantage and avoid the local extremum problem. The method based on template matching is simpler in algorithm implementation. The method first establishes a data sample template for each action, and then determines the classification of the action by calculating the similarity between the action feature to be tested and the template feature. This paper uses Euclidean distance as the calculation of similarity. However, since the method is greatly affected by the difference in body shape and posture of the human body to be tested and the template library, the recognition rate is low.
This research is based on image recognition technology, taking sports as the research object, identifying the sports process of sports athletes, and performing image processing on sports videos. At the same time, this paper obtains effective information through image processing, and on this basis, further improves the efficiency of sports training and sports competition. In this study, the image is segmented from the video. Segmentation in space is the detection and segmentation of the moving target. Specifically, it separates the independent regions of interest or meaning in the video sequence from the background. After the skin color detection, a series of connected regions are obtained, and the image only contains the black pixels of these connected regions and other excluded white regions, and a new binary image is obtained. The main research goal of this paper is to identify the types of poses that athletes make in the game in a complex environment, and use the idea of feature fusion to characterize the poses of people with multiple features. Therefore, in the representation of human motion state, this paper extracts the key features of the overall shape and motion of the human body. The experimental research shows that the technology proposed in this study has certain practical effects and can be applied to the actual competition.
The authors thank the editor and anonymous reviewers for their helpful comments and valuable suggestions.
Availability of data and materials
Please contact author for data requests.
GL designed the research framework and wrote the manuscript, and CZ was responsible for proofreading and optimization of the results. Both authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
- Y. Kong, Z. Wei, S. Huang, Automatic analysis of complex athlete techniques in broadcast taekwondo video. Multimed Tools Appl. 77(11), 1–18 (2017).Google Scholar
- W.H. Gageler, S. Wearing, D.A. James, Automatic jump detection method for athlete monitoring and performance in volleyball. Int. J. Perform. Anal. Sport 15(1), 284–296 (2015).View ArticleGoogle Scholar
- R. Abächerli, R. Schmid, R. Kobza, O-6 Automatically executed Seattle criteria lead to six percent of abnormal resting ECGs in young swiss males. Br. J. Sports Med. 50(Suppl 1), A3.2–A3A4 (2016).View ArticleGoogle Scholar
- Z. Mahmood, S. Khattak, S. Khattak, et al., Automatic player detection and identification for sports entertainment applications. Pattern. Anal. Applic. 18(4), 971–982 (2015).MathSciNetView ArticleGoogle Scholar
- T.J. Gabbett, Quantifying the physical demands of collision sports: does microsensor technology measure what it claims to measure? J. Strength Cond. Res. 27(8), 2319 (2013).View ArticleGoogle Scholar
- X. Bai, T. Zhang, C. Wang, et al., A fully automatic player detection method based on one-class SVM. IEICE Trans. Inf. Syst. 96(2), 387–391 (2013).View ArticleGoogle Scholar
- D. Sajber, J. Rodek, Y. Escalante, et al., Sport nutrition and doping factors in swimming; parallel analysis among athletes and coaches. Coll Antropol 37(2), 179–186 (2013).Google Scholar
- G. Liang, P. Shivakumara, T. Lu, et al., Multi-spectral fusion based approach for arbitrarily oriented scene text detection in video images. IEEE Trans. Image Proc. Publ. IEEE Sign. Proc. Soc. 24(11), 4488–4501 (2015).MathSciNetView ArticleGoogle Scholar
- J.D. Vescovi, Impact of maximum speed on Sprint performance during high-level youth female field hockey matches: female athletes in motion (FAiM) study[J]. Int. J. Sports Physiol. Perform. 9(4), 621–626 (2014).View ArticleGoogle Scholar
- P. Li, Y. Zhu, Research on burning zone detection method based on flame image recognition for ceramic roller kiln. Appl. Mech. Mater. 602–605, 1761–1767 (2014).Google Scholar
- J. Sun, C. Li, In-pit coal mine personnel uniqueness detection technology based on personnel positioning and face recognition. Int. J. Min. Sci. Technol. 23(3), 357–361 (2013).View ArticleGoogle Scholar
- B. Wang, Y.B. Gao, X.T. Lu, Research on anti-camouflaged target system based on spectral detection and image recognition. Spectrosc. Spectr. Anal. 35(5), 1440 (2015).Google Scholar
- R. Mooney, G. Corley, A. Godfrey, et al., Analysis of swimming performance: perceptions and practices of US-based swimming coaches. J. Sports Sci. 34(11), 997–1005 (2016).View ArticleGoogle Scholar
- A.F. Hani, D. Kumar, A.S. Malik, et al., Non-invasive and in vivo assessment of osteoarthritic articular cartilage: A review on MRI investigations. Rheumatol. Int. 35(1), 1–16 (2015).View ArticleGoogle Scholar
- L. Sun, J. Xing, Z. Wang, et al., Virtual reality of recognition technologies of the improved contour coding image based on level set and neural network models. Neural Comput. Applic. 29(5), 1311–1330 (2018).View ArticleGoogle Scholar
- E. Chamard, L. Henry, Y. Boulanger, et al., A follow-up study of neurometabolic alterations in female concussed athletes. J. Neurotrauma 31(4), 339–345 (2014).View ArticleGoogle Scholar
- L. Anderson, P. Orme, R.D. Michele, et al., Quantification of seasonal long physical load in soccer players with different starting status from the English Premier League: implications for maintaining squad physical fitness. Int. J. Sports Physiol. Perform. 11(8), 1038–1046 (2016).View ArticleGoogle Scholar
- B. Najafi, J. Leeeng, J.S. Wrobel, et al., Estimation of center of mass trajectory using wearable sensors during golf swing. J. Sports Sci. Med. 14(2), 354 (2015).Google Scholar
- Peng L, Liu S, Liu R, et al. Effective Long short-term Memory with Differential Evolution Algorithm for Electricity Price Prediction[J]. Energy. 2018.Google Scholar