Skip to main content

Automatic detection technology of sports athletes based on image recognition technology


In order to improve the motion recognition effect of sports athletes based on image recognition technology, this study takes the current common diving athletes as the research material in the actual research, and combines the research status of image recognition to study the athlete’s motion recognition from image processing. Simultaneously, in this study, the gradient segmentation method is used to segment the image, the research object is segmented from the video image, the traditional image grayscale method is improved, and the image segmentation algorithm adapted to the diving motion is obtained. On this basis, this study combines Gaussian mixture background modeling and background subtraction to achieve the detection and extraction of target human body regions, and uses morphological operators to deal with noise and void phenomena in foreground images. The example analysis shows that the proposed method has certain practicality and can provide theoretical reference for subsequent related research.

1 Introduction

In recent years, the application of computer vision technology in sports video has become a new hot spot. Sports competition is one of the most popular programs, so sports video has become an important type of media data, which has a large audience and a huge application prospect. Image recognition processing of sports video can obtain effective detection results, which has certain effects on athlete training and is conducive to the effective development of the game process. In recent years, researchers at home and abroad have done a lot of work on moving target detection and segmentation and human motion pose recognition. The research content in sports video target detection mainly includes sports field detection, athlete detection, and ball detection. The research scope of gesture recognition includes gait recognition, gesture recognition, and so on.

Target segmentation is a key step in the identification process, which usually means separating the target we are interested in and the background we are in as an independent part [1]. Traditional segmentation methods include background subtraction, interframe difference and optical flow, as well as skin color detection for human detection. Among them, the background difference method is the simplest and direct method, which is to subtract the image to be segmented from the background model image, and the difference is the motion region. The core content of this method is the construction of the background model [2]. Croston et al. established a statistical background model for each pixel in the picture [3]. Walseth et al. combine pixel color and gradient information to create an adaptive background model [4]. Lu builds an adaptive background model based on Kalman filtering [5]. The inadequacy of the background difference method is that it is easily interfered by external conditions such as light and weather. Interframe difference is the separation of foreground and background by calculating the difference between pixels in the corresponding two or more frames [6]. Young et al. applied the interframe difference method to detect players in the field during a football match. This method works well for video detection with background stillness, but it cannot accurately determine the target position, nor can it extract the complete target [7]. The background difference method and the interframe difference method are not ideal when detecting the motion of the background motion. At the same time, the two methods in the diving game usually follow the player’s decline and move downwards, the background has a very large movement, so the segmentation effect using the above two methods is not very good [8]. The optical flow method is to calculate the pixel area corresponding to the moving target model by calculating the optical flow field between adjacent frames, and then combine these areas to form a moving target, and the detected target area is usually unreliable [9]. Hal takes advantage of the large differences in the direction of movement between the athletes and the background in the diving competition and the fact that most of the athlete’s body is exposed, combining the optical flow method with the skin color detection to segment the target [10].

In feature-based algorithms, certain features of the tracking target are used to distinguish between tracking targets and other objects in a frame of video. Some algorithms use the background image as a reference, i.e., the so-called background frame [11]. All objects in the “difference frame” obtained by subtracting the background frame from the current frame are the calculated tracking target index [12]. In order to identify the tracking target from other objects, the tracking target is characterized by the features of the tracking target. Parameterized shapes, color distributions, shapes, and colors in the tracking target representation can be used as features. The neural network classifier is trained with the feature and manually marked tracking targets, and then the trained neural network classifier is used to distinguish the tracking target from other objects [13]. The color histogram in the elliptical area is used to track the pool of players on the court. These algorithms make more use of low-level image information, acquire features in a simple manner, and use a rough feature to describe the whole behavior, which is sensitive to noise, perspective changes, and subject changes in behavior [14].

In recent years, domestic and foreign scholars have done a lot of research on the challenging topic of human behavior analysis, recognition, and understanding [15]. Based on the projection information of the moving target, Guo et al. combined the PCA algorithm to classify and recognize nine different postures such as standing, lying, sitting, and walking. These actions are based on the whole body, and the difference between the actions is large. CuCcbjara et al. use Bayesian classifiers to classify four distinct postures of contracture, sitting, standing, and lying. In addition, Hu Changbo et al. realized the identification of Yang’s six sets of Taijiquan movements through PCA modeling [16]. Although these methods have a high recognition rate for the recognized motion postures, they are all carried out in a specific environment with a simple background, a camera still, and a slow-motion movement [17]. Leonard et al. used the template matching method to analyze the athlete’s body posture in the diving video, but the template-based analysis has high requirements on the template library capacity and quality [18]. Crance uses a multi-feature fusion method to extract multiple features from the segmented target image, and uses the SVM output probability method to identify and classify three aerial poses.

Through the above review, it is known that the current image detection and image recognition technologies have been developed and have been applied to many industries [19]. This research is based on image recognition technology analysis, taking sports as the research object, identifying the sports process of sports athletes, performing image processing on sports videos, and obtaining effective information through image processing, thus further improving the efficiency of sports training and sports competition.

2 Research methods

From the perspective of segmentation, the segmentation of the video object in space is the detection and segmentation of the moving target. Specifically, it refers to separating the independent regions of interest or meaning in the video sequence from the background. Target segmentation is the most basic part of video motion pose recognition. If the target can be correctly detected and segmented in each frame image, it provides a guarantee for the correct recognition of the pose. However, target detection is subject to many unknown factors, and in order to suppress these external interferences, there is often a real-time price.

The motion of the two-dimensional image is the projection of the three-dimensional velocity vector of the visible point in the scene on the imaging plane. An estimate of the instantaneous variation of a point in a sequence of sequential images is generally considered to be an optical flow field or a velocity field. Optical flow field calculation methods are generally divided into five types: gradient-based methods, energy-based methods, matching-based algorithms, phase-based methods, and neurodynamic methods. Among them, the gradient-based method uses the image gray value to calculate the optical flow field. It is assumed that the gray value before and after the moving image remains unchanged, and the optical flow constraint equation is derived, which is the most studied method. However, since the optical flow equation does not uniquely determine the optical flow, other constraints need to be introduced. According to the introduced constraints, the gradient-based methods can be divided into two categories: global constraint methods and local constraint methods. Typical algorithms include Hom-Schunck algorithm and Lucas-Kanade algorithm. In contrast, the Lucas-Kanade algorithm has improved a lot in accuracy and speed, and has strong anti-noise ability. The calculation method of the algorithm will be described in detail below.

Assuming that the point m = (x, y)T on the image has a gray value of I (x, y, t) at time f, then after the time dt is derived, the gray value of the corresponding point should be expressed as I(x + dx, y + dy, t + dt). When dt → 0, the gray value of the two points can be considered to be unchanged, that is,

$$ I\left(x\kern0.5em +\kern0.5em \mathrm{dx},y+\mathrm{dy},t\kern0.5em +\kern0.5em \mathrm{dt}\right)\kern0.5em =\kern0.5em I\left(x,y,t\right)a $$

If the gray value of the image changes slowly with x, y, t, then the left side of Eq. (1) can be expanded by Taylor series:

$$ \left(x\kern0.5em +\kern0.5em \mathrm{dx},y\kern0.5em +\kern0.5em \mathrm{dy},t\kern0.5em +\kern0.5em \mathrm{dt}\right)\kern0.5em =\kern0.5em I\;\left(x,y,t\right)\kern0.5em +\kern0.5em \frac{\partial I}{\partial x} dx+\frac{\partial I}{\partial y} dy+\frac{\partial I}{\partial t} dt+\varepsilon $$

Among them, ε represents an infinitesimal term of second order or higher order. Since dt → 0, the ε in the above equation is ignored, so that

$$ \frac{\partial I}{\partial x} dx+\frac{\partial I}{\partial y} dy+\frac{\partial I}{\partial t} dt=0 $$

\( u=\frac{dx}{dt},u=\frac{dx}{dt} \) represents the optical flow in the x and y directions, and \( {I}_x=\frac{\partial I}{\partial x},{I}_y=\frac{\partial I}{\partial y},{I}_t=\frac{\partial I}{\partial t} \) represent the partial derivatives of the gray value with respect to x, y, t, respectively. Then Eq. (3) can be expressed as

$$ {I}_xu\kern0.5em +\kern0.5em {I}_yv\kern0.5em +\kern0.5em {I}_t=0 $$

The above formula is the basic equation of optical flow. As mentioned above, the optical flow equation contains two unknowns of u and v. It is impossible to uniquely determine by one equation alone. In order to solve this problem, we must find a new constraint equation. We use the local constrained Lucas-Kanade algorithm to increase the constraint condition, and use the windowed weighting method to process the optical flow calculation to obtain the optical flow between two adjacent frames. It is assumed that the optical flows of the points in a small area centered on the p-point are the same, and different points in the area are given different weights, so that the calculation of the optical flow is converted into the minimum value of the Eq. (5) to estimate the velocity (u, v):

$$ {\sum}_{\left(x, y\epsilon \Omega \right)}{W}^2\left(x,y\right){\left({\mathrm{I}}_xu+{\mathrm{I}}_yv+{\mathrm{I}}_t\right)}^2 $$

Among them, Ω represents the neighborhood centered on point p, and W2(x, y) represents the window weight function, and A represents the weight of each pixel in the neighborhood. Usually, the Gaussian function is used, and the closer the p-point is, the larger the weight is, so that the pixel in the central region of the neighborhood has a greater influence than the peripheral. Lucas-Kanade assumes that the motion vector remains constant over a small spatial neighborhood Ω and then uses the weighted least squares method to estimate Eqs. (6) and (7). The velocity (u, v) is solved by two equations, which solves the above problem that only the optical flow constraint equation cannot solve two unknowns.

$$ u\sum \limits_{\left(x, y\epsilon \Omega \right)}{W}^2\left(x,y\right){I}_x{I}_y+v\sum \limits_{\left(x, y\epsilon \Omega \right)}{W}^2\left(x,y\right){I}_y^2+\sum \limits_{\left(x, y\epsilon \Omega \right)}{W}^2\left(x,y\right){I}_t{I}_y=0 $$
$$ u\sum \limits_{\left(x, y\epsilon \Omega \right)}{W}^2\left(x,y\right){I}_x^2+v\sum \limits_{\left(x, y\epsilon \Omega \right)}{W}^2\left(x,y\right){I}_x{I}_y+\sum \limits_{\left(x, y\epsilon \Omega \right)}{W}^2\left(x,y\right){I}_t{I}_y=0 $$

In practice, this method is usually combined with a Gaussian pyramid distribution. It is assumed that the original image is I, and I0 = I represents the layer 0th image (original image). The pyramid image represents an image that is created in a regression form and is launched by a lower layer. In actual calculation, the algorithm uses the order from the upper layer to the lower layer. The experimental part of this paper is divided into four layers. When the optical flow increment of a certain level is calculated, it will be added to its initial value, and then the projection reconstruction will be carried out, which will be used as the initial value of the calculation of the next layer of optical flow. This process continues until the optical flow of the original image is estimated. After the skin color detection, a series of connected areas are obtained, and the image becomes a black pixel point and other excluded white areas only of these connected areas, and a new binary image is obtained. Although most of the global motion regions have been found when using the Lucas-Kanade method, and the regions found are taken to be shielded, there are still some missing global motion regions. After skin color detection, these areas will inevitably be mistaken for skin color check i914, which becomes noise. The reason for this is that in addition to the target skin area, the image includes a background area similar to the color of the skin, such as the face of the audience in the auditorium, clothing, other facilities in the scene, and the like. In order to further reduce the interference of the erroneously detected skin color regions in these background regions on the segmentation results, in this paper, the characteristics of the player’s screen in the diving game video are usually at the center position and the proportion of the connected area formed by other noise is larger. The projection method is used to determine the moving target in the rectangular frame. The specific algorithm is as follows: We project the connected region in the image with noise in the vertical direction to form an m-segment projection, as shown in Fig. 1. At the same time, we find the length of Lv as the target rectangle.

Fig. 1
figure 1

Projection denoising

The main goal of this paper is to identify the types of poses that athletes are doing in a complex environment. Therefore, in the representation of human motion state, this study extracted the key features of the overall shape and motion of the human body. However, whether it is based on the appearance of shape features alone or the use of motion features to characterize people’s motion state, there will be deficiencies. Therefore, this paper uses the idea of feature fusion to represent people’s sports postures with multiple feature fusions. The selected features will be described in detail below.

Different actions make the distribution of skin tones different in rectangular boxes. For this feature, the rectangular frame is divided into four blocks according to the “Tian” shape, and numbered in order from left to right and from top to bottom. The average value of R, G, and B is obtained for each block, as shown in Eq. (8). \( \overline{{\mathrm{f}}_{bR}} \),\( \overline{{\mathrm{f}}_{bG}} \), \( \overline{{\mathrm{f}}_{bB}\ } \)indicates the average of R, G, and B in the sixth block. Among them, b = {1, 2, 3, 4}, N represents the number of pixels in each block, and Ri, Gi, Bi represents the R, G, and B values of the ith pixel.

$$ \left\{\begin{array}{c}\overline{{\mathrm{f}}_{bR}}={\sum}_{i=1}^N\frac{R_i}{N}\\ {}\overline{{\mathrm{f}}_{bG}}={\sum}_{i=1}^N\frac{G_i}{N}\\ {}\overline{{\mathrm{f}}_{bB}}={\sum}_{i=1}^N\frac{B_i}{N}\end{array}\right. $$

The color features extracted here do not extract the color histogram for the entire target as usual, but only the average of the three primary color channels in each block. This is because if the same athlete is doing different postures, the color histograms in the rectangular frame that determine the target position are very similar, and such color features have little meaning for the recognition of the posture. Another advantage of using this feature is that it can significantly reduce the dimension of the feature. If the color histogram is used, there is a 0–255 dimension feature, which increases the amount of computation. Figure 2 is a three-frame image of the diving process.

Fig. 2
figure 2

Three-frame image of the diving process

The next process is image grayscale processing. First, a mapping function from color map to gray scale is defined. This function must ensure the contrast, ensure the continuity of the map, ensure the consistency of the map, and the order of brightness. This function can be a linear function or it can be nonlinear. The second step is to block the image and use the traditional k-means-based segmentation algorithm to divide the image into blocks of many super pixels for subsequent processing. The third step is to obtain a color image and a grayscale image with parameters, and generate a corresponding color image saliency map and grayscale image saliency map. Using the color image after the block and the color image before the block, we obtain the saliency map of the color image through a color image saliency detection algorithm. On the other hand, after the first step is processed, a grayscale image with parameters is generated, and the saliency map of the grayscale image with parameters can be obtained by the same saliency detection algorithm. In the fourth step, the energy function is obtained by using the saliency map generated by the color image and the saliency map generated by the gray image with parameters, and the optimal value of the parameter is obtained by optimizing the energy function. Finally, the optimal value is brought into the gray image with parameters to obtain the final gray image. The results obtained by the treatment are shown in Fig. 3.

Fig. 3
figure 3

Image gray processing results

xt is set to a certain pixel value at time t. If xt matches the existing jth Gaussian distribution, the weight of the Gaussian distribution is updated to:

$$ {\mathrm{w}}_{i,t}=\left(1-\beta \right){\mathrm{w}}_{i,t-1}+\beta {\mathrm{M}}_{i,t},i=1,2,\dots, K $$
$$ {\mathrm{M}}_{i,t}=\left\{\begin{array}{c}1,i=j\\ {}0,i\ne j\end{array}\right. $$

Among them, β is the weight update rate. The above equation shows that only the weight of the Gaussian distribution matching xt is increased, and the remaining weights are reduced. The parameters of the Gaussian distribution are updated to:

$$ \left\{\begin{array}{c}{\mu}_{j,t}=\left(1-\alpha \right){\mu}_{j,t-1}+\alpha {\mathrm{x}}_t\\ {}{\sigma}_{j,t}^2=\left(1-\alpha \right){\sigma}_{j,t-1}^2+\alpha {\left({\mathrm{x}}_t-{\mu}_{j,t-1}\right)}^T{\left({\mathrm{x}}_t-{\mu}_{j,t-1}\right)}^2\end{array}\right. $$

Among them, α is the update rate of the Gaussian distribution parameter, and the parameters remain unchanged for the Gaussian distribution with no matching success. In the establishment of the background model, we set the number of Gaussian distributions describing each pixel 3 = K. The background model is initialized first, and the initial weights are w1, 0 = 1, w2, 0 = 1, w3, 0 = 1. The pixels of the first frame are used to initialize the first Gaussian distribution mean, and the mean of the remaining Gaussian distribution is 0. The standard deviation of each model takes a larger value σi, 0 = 30, a weight update rate β = 0.33, a learning rate α = 0.7, and a threshold value of 7.0 = T. If no Gaussian distribution is found to match the xt at the time of detection, then a Gaussian distribution with the lowest priority is removed, and a new Gaussian distribution is introduced according to xt, and a smaller weight and a larger variance are assigned, and then weight normalization is performed.

Firstly, the Gaussian mixture background modeling and background subtraction method are combined to detect and extract the target human body region, and the morphological operator is used to process the noise and cavity phenomena in the foreground image. We fill the holes in the target area and remove the isolated noise points by the operation of first opening and then closing, and smooth the contour of the human body. During the morphological processing, the overall position and shape of the target object are unchanged, and the details are easily destroyed when the target size is relatively small. The process of the motion region extraction algorithm is shown in Fig. 4.

Fig. 4
figure 4

Image recognition of athletes

In order to improve the computational efficiency, the contour image of each player is equally divided into h × w sub-blocks that do not overlap each other. Then calculate the normalized value of each sub-block with \( {N}_i=\frac{b(i)}{mv},i=1,2,\dots, h\times w \). Among them, b(i) is the number of foreground pixels of the ith block, and mv is the maximum of all b(i). In space, the description of the sportsman’s outline of the tth frame is ft = [N1, N2,  … , Nh × w], and the player’s outline in the entire video is correspondingly expressed as vf = {f1,f2,  … , fT,}. In fact, the original player outline representation Vr can be considered as a special case based on block features, that is, a pixel whose block size is 1 × 1. Due to the difference in distance and angle of the pedestrian from the camera in the video sequence, the size of the human body is greatly different, and the contour of the human body needs to be normalized before the contour feature is extracted. After the target area is raised, its outline is mapped to a uniform height H, and the width is scaled accordingly, as shown in Fig. 5. Among them.

Fig. 5
figure 5

normalized image

3 Results

In order to study the detection effect of image-based automatic detection technology for sports athletes, this paper takes Avi format video as sample data, all of which come from the live video of downloading 10 m platform male and female single diving competition from the Internet. We chose a 160-segment sequence, each of which is a complete diving process, usually between 3 and 6 s. We chose a 160-segment sequence, each of which is a complete diving process, usually between 3 and 6 s. At the same time, we take 3109 frames containing the desired motion in the diving as a sample, and the sample resolution is 480 × 360. During the diving game, the athlete usually completes some transitions and connections in addition to completing one of the three postures during the completion of a diving. The entire process lasts for a short time, usually 3–4 s, so the athletes spend less time completing one of the prescribed actions. In order to more accurately identify the actions performed by the diving process, this paper identifies each frame of the sample video.

In the segmentation stage, the optical flow method is used to estimate and eliminate the global motion region caused by camera motion, and then the skin color detection is used to determine the target position in the motion region, as shown in Fig. 6.

Fig. 6
figure 6

Estimation results of global motion

Figure 7 is a comparison of the effects of denoising by the projection method used in this paper and the corrosion expansion method in the morphology.

Fig. 7
figure 7

Comparative analysis of de-drying methods

In the gesture recognition stage, in order to better describe the motion information, this paper uses a variety of feature fusion methods to extract four characteristics such as SIFT. Through experiments, we compare the recognition result of SVM-based methods commonly used in classification and the recognition result of template-based matching methods, as shown in Table 1.

Table 1 Identification results

4 Analysis and discussion

In a diving game, the camera usually moves as the player descends, resulting in a global movement. In the segmentation stage, the optical flow method is used to estimate and eliminate the global motion region caused by camera motion, and then the skin color detection is used to determine the target position in the motion region. This method eliminates a large amount of noise while eliminating global motion, and reduces a lot of noise compared to the result of directly using skin color detection, as shown in Fig. 6. When the target position is directly used for skin color detection, due to the background is usually the audience in the diving competition venue, many colors such as the skin color of the audience, the color of the scene facilities, and the color of the clothes in the background area may be close to the skin color of the target human body, so that it may be mistaken for the target skin color and detected, as shown in Fig. 6b. Figure 6c shows the image after the global motion is removed using the optical flow method, and Fig. 6d shows the result of the skin color detection based on the removal of the global motion. It can be seen that the noise is much reduced compared to Fig. 6b.

After the image is processed by optical flow method and skin color detection, a series of connected regions are formed, and no noise regions are still unavoidable, as shown in Fig. 7. In order to eliminate the noise as much as possible, this paper uses the projection method to further eliminate the noise and segment the moving target as accurately and completely as possible. It can be seen from the figure that the projection method adopted in this paper makes full use of the characteristics of the athletes in the main position of the diving video, and preserves the information of the target while removing the noise. Conversely, if the noise is to be completely removed, the morphological method is likely to lose some of the target information while removing the noise. As shown in Fig. 7b, in order to completely filter out the noise in the image, a loss occurs in a part of the target human body. The reason is that athletes wear swimwear, and after skin color detection, because the skin color part of the swimsuit is not detected, a small connection is formed, and the morphological method mistakes these small connections as a boundary to disconnect them, resulting in loss of the target information. As shown in Fig. 7c, in order to preserve the target information as much as possible in the process of denoising, there is still noise, which will inevitably affect the subsequent recognition results. In Fig. 7d, the area of the connected region due to noise is large, and the information of the target has been lost when the result of noise removal is not obvious. In this way, when the noise is completely removed, the target information is definitely lost a lot, and it will definitely affect the result of the final recognition. According to the characteristics that the athlete is always in the middle of the screen, the noise reduction effect of the projection method in this paper is ideal. The noise is removed and the target is segmented without losing the target information, as shown in Fig. 7e.

It can be seen from Table 1 that the statistical-based machine learning method SVM has a good recognition rate, while the template-based method has poor recognition results for the three postures. The reason is that the SVM method can get the optimal solution under the limited sample information, and can get the global best advantage and avoid the local extremum problem. The method based on template matching is simpler in algorithm implementation. The method first establishes a data sample template for each action, and then determines the classification of the action by calculating the similarity between the action feature to be tested and the template feature. This paper uses Euclidean distance as the calculation of similarity. However, since the method is greatly affected by the difference in body shape and posture of the human body to be tested and the template library, the recognition rate is low.

5 Conclusion

This research is based on image recognition technology, taking sports as the research object, identifying the sports process of sports athletes, and performing image processing on sports videos. At the same time, this paper obtains effective information through image processing, and on this basis, further improves the efficiency of sports training and sports competition. In this study, the image is segmented from the video. Segmentation in space is the detection and segmentation of the moving target. Specifically, it separates the independent regions of interest or meaning in the video sequence from the background. After the skin color detection, a series of connected regions are obtained, and the image only contains the black pixels of these connected regions and other excluded white regions, and a new binary image is obtained. The main research goal of this paper is to identify the types of poses that athletes make in the game in a complex environment, and use the idea of feature fusion to characterize the poses of people with multiple features. Therefore, in the representation of human motion state, this paper extracts the key features of the overall shape and motion of the human body. The experimental research shows that the technology proposed in this study has certain practical effects and can be applied to the actual competition.


  1. Y. Kong, Z. Wei, S. Huang, Automatic analysis of complex athlete techniques in broadcast taekwondo video. Multimed Tools Appl. 77(11), 1–18 (2017).

    Google Scholar 

  2. W.H. Gageler, S. Wearing, D.A. James, Automatic jump detection method for athlete monitoring and performance in volleyball. Int. J. Perform. Anal. Sport 15(1), 284–296 (2015).

    Article  Google Scholar 

  3. R. Abächerli, R. Schmid, R. Kobza, O-6 Automatically executed Seattle criteria lead to six percent of abnormal resting ECGs in young swiss males. Br. J. Sports Med. 50(Suppl 1), A3.2–A3A4 (2016).

    Article  Google Scholar 

  4. Z. Mahmood, S. Khattak, S. Khattak, et al., Automatic player detection and identification for sports entertainment applications. Pattern. Anal. Applic. 18(4), 971–982 (2015).

    Article  MathSciNet  Google Scholar 

  5. T.J. Gabbett, Quantifying the physical demands of collision sports: does microsensor technology measure what it claims to measure? J. Strength Cond. Res. 27(8), 2319 (2013).

    Article  Google Scholar 

  6. X. Bai, T. Zhang, C. Wang, et al., A fully automatic player detection method based on one-class SVM. IEICE Trans. Inf. Syst. 96(2), 387–391 (2013).

    Article  Google Scholar 

  7. D. Sajber, J. Rodek, Y. Escalante, et al., Sport nutrition and doping factors in swimming; parallel analysis among athletes and coaches. Coll Antropol 37(2), 179–186 (2013).

    Google Scholar 

  8. G. Liang, P. Shivakumara, T. Lu, et al., Multi-spectral fusion based approach for arbitrarily oriented scene text detection in video images. IEEE Trans. Image Proc. Publ. IEEE Sign. Proc. Soc. 24(11), 4488–4501 (2015).

    Article  MathSciNet  Google Scholar 

  9. J.D. Vescovi, Impact of maximum speed on Sprint performance during high-level youth female field hockey matches: female athletes in motion (FAiM) study[J]. Int. J. Sports Physiol. Perform. 9(4), 621–626 (2014).

    Article  Google Scholar 

  10. P. Li, Y. Zhu, Research on burning zone detection method based on flame image recognition for ceramic roller kiln. Appl. Mech. Mater. 602–605, 1761–1767 (2014).

    Google Scholar 

  11. J. Sun, C. Li, In-pit coal mine personnel uniqueness detection technology based on personnel positioning and face recognition. Int. J. Min. Sci. Technol. 23(3), 357–361 (2013).

    Article  Google Scholar 

  12. B. Wang, Y.B. Gao, X.T. Lu, Research on anti-camouflaged target system based on spectral detection and image recognition. Spectrosc. Spectr. Anal. 35(5), 1440 (2015).

    Google Scholar 

  13. R. Mooney, G. Corley, A. Godfrey, et al., Analysis of swimming performance: perceptions and practices of US-based swimming coaches. J. Sports Sci. 34(11), 997–1005 (2016).

    Article  Google Scholar 

  14. A.F. Hani, D. Kumar, A.S. Malik, et al., Non-invasive and in vivo assessment of osteoarthritic articular cartilage: A review on MRI investigations. Rheumatol. Int. 35(1), 1–16 (2015).

    Article  Google Scholar 

  15. L. Sun, J. Xing, Z. Wang, et al., Virtual reality of recognition technologies of the improved contour coding image based on level set and neural network models. Neural Comput. Applic. 29(5), 1311–1330 (2018).

    Article  Google Scholar 

  16. E. Chamard, L. Henry, Y. Boulanger, et al., A follow-up study of neurometabolic alterations in female concussed athletes. J. Neurotrauma 31(4), 339–345 (2014).

    Article  Google Scholar 

  17. L. Anderson, P. Orme, R.D. Michele, et al., Quantification of seasonal long physical load in soccer players with different starting status from the English Premier League: implications for maintaining squad physical fitness. Int. J. Sports Physiol. Perform. 11(8), 1038–1046 (2016).

    Article  Google Scholar 

  18. B. Najafi, J. Leeeng, J.S. Wrobel, et al., Estimation of center of mass trajectory using wearable sensors during golf swing. J. Sports Sci. Med. 14(2), 354 (2015).

    Google Scholar 

  19. Peng L, Liu S, Liu R, et al. Effective Long short-term Memory with Differential Evolution Algorithm for Electricity Price Prediction[J]. Energy. 2018.

Download references


The authors thank the editor and anonymous reviewers for their helpful comments and valuable suggestions.


Not applicable.

Availability of data and materials

Please contact author for data requests.

Author information

Authors and Affiliations



GL designed the research framework and wrote the manuscript, and CZ was responsible for proofreading and optimization of the results. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Guangjing Li.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, G., Zhang, C. Automatic detection technology of sports athletes based on image recognition technology. J Image Video Proc. 2019, 15 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: