Remote sensing image mosaic technology based on SURF algorithm in agriculture

The remote sensing technology of unmanned aerial vehicle (UAV) is a low altitude remote sensing technology. The technology has been widely used in military, agricultural, medical, geographical mapping, and other fields by virtue of the advantages of fast acquisition, high resolution, low cost, and good security. But limited by the flying height of UAV and the focal length of the digital camera, the single image obtained by the UAV is difficult to form the overall cognition of the ground farmland area. In order to further expand the field of view, it is necessary to mosaic multiple single images acquired by UAV into a complete panoramic image of the farmland. In this paper, aiming at the problem of UAV low-altitude remote sensing image splicing, an image mosaic technique based on Speed Up Robust Feature (SURF) is introduced to achieve rapid image splicing. One hundred fifty ground farmland remote sensing images collected by UAV are used as experimental splicing objects, and the image splicing is completed by the global stitching strategy optimized by Levenberg-Marquardt (L-M). Experiments show that the strategy can effectively reduce the influence of cumulative errors and achieve automatic panoramic mosaic of the survey area.


Introduction
At present, remote sensing images have become an important means of information acquisition. Through high-resolution remote sensing images acquired by satellites, air planes, and radars, the information required by people can be obtained through image analysis. With the development of social economy and the need of national defense construction, the demand for high-resolution remote sensing images, and basic geographic information in all sectors of the society is more and more urgent, and their current requirements are becoming higher and higher. The remote sensing data acquired by satellites, air planes, and radar alone is difficult to fully satisfy the needs of image data acquisition and processing. Compared with traditional aerospace remote sensing technology, UAV low-altitude remote sensing technology, as a new low-altitude remote sensing technology, has many advantages such as high flexibility, easy operation, high resolution, and low investment. It has become one of the methods to obtain remote sensing data [1]. UAV remote sensing uses remote sensing application technology, telemetry and remote control technology, remote sensing sensor technology, communication technology, GPS positioning technology, and unmanned aerial vehicle technology to realize intelligent and automated rapid acquisition of space remote sensing information such as earthquake disaster areas, natural environments, and land resources, at the same time, complete data processing, modeling, and application analysis [2]. As the UAV has the characteristics of light weight, high structural strength, low cost of use and procurement, convenient maintenance, and high compatibility in hardware, the application of UAV to lowaltitude remote sensing data acquisition is becoming more and more extensive. At present, the application of small UAV mainly focuses on the following aspects: disaster assessment, disaster relief, flood control, environmental monitoring and management, urban management, resource management, and assessment of land dynamic changes, and so on [3].
At present, the application of remote sensing imagery to agricultural work has become a trend in the development of remote sensing technology and agricultural engineering. It uses remote sensing imagery as a source of information and provides support for optimal agricultural production decisions through the processing of remote sensing imagery information [4]. Due to the rapid growth of crops and rapid changes in production demand, it is often necessary to obtain images for a specified period of time. Compared with manned aerial remote sensing and satellite remote sensing, UAV can carry out tasks quickly, with flexible flight time, convenient and fast operation, and can ensure the acquisition of dynamic data. The remote sensing data of high resolution resource satellites return to a long time and poor timeliness, so it is impossible to obtain the specified range of data in a short time. In addition, the UAV remote sensing is more suitable for the image acquisition in the small area. The UAV can be better applied to the agricultural field because of the advantages of obtaining high spatial and temporal spatial image and low cost. UAV remote sensing has potential application value in agriculture. The introduction of UAV into agriculture can greatly improve the level of agricultural modernization, and it has a long-term significance for promoting the development of modern agriculture [5].
However, the remote sensing images acquired by UAV have the characteristics of small amplitude and large quantity [6]. To get the global information of the whole region, we must mosaic and synthesize UAV remote sensing images. Image mosaic is a technique that combines several images with overlapping parts (the images may be obtained at different times, different viewing angles or by different sensors) into a large-scale seamless high-resolution image [7]. There are many methods of image mosaic, and there are differences in different algorithm steps, but the general process is the same.
Generally speaking, it mainly includes the following five steps [8]: (1) image preprocessing. It is mainly the basic operation of digital image denoising, histogram processing, and creating image matching templates. (2) Image matching. Image matching is the key to image mosaic technology. It refers to using a certain matching strategy to find the corresponding position of the template or feature point in the reference image that is to be stitched, and then determine the transformation relationship between the two images. (3) Establish a transformation model. According to the corresponding relationship between template or image features, calculate the value of each parameter in the mathematical model, thus establishing a mathematical transformation model of the two images. (4) Uniform coordinate transformation. According to the established mathematical transformation model, the image to be stitched is converted into the coordinate system of the reference image, and the unified coordinate transformation is completed. (5) Fusion reconstruction. The overlapping regions of the image to be stitched are fused to obtain a seamless panoramic image that is splice and reconstructed. According to the requirements of different stitching purposes, stitching accuracy and stitching speed, the low-altitude remote sensing image splicing of UAV can be divided into four types: quick mosaic with seam, panoramic image mosaic, uncontrolled orthophoto mosaic, controlled orthophoto mosaic [9]. The comparison of the four types of mosaic technology is shown in Table 1.
The panoramic image mosaic can be used to form a large panoramic image map from multiple images acquired by UAV. It can realize fast and seamless image stitching, and the object deformation is small, the relative Panoramic image mosaic Fast and seamless mosaic can be achieved in the airstrip, and the deformation of the object is small, the relative position relationship is more accurate, and the overall mosaic speed is fast.
Application to emergency and disaster monitoring Uncontrolled orthophoto mosaic The mosaic results have high precision, and no mosaic cracks and distortions are found after mosaic. The images also contain spatial coordinate information.
The accuracy of the elevation information of the image is not high Change detection for urban and rural areas can also be used for orthophoto production which is not easy to enter the area.
Controlled orthophoto mosaic Greatly reducing the number of field control points, its clustered processors can quickly achieve image matching, DEM automatic generation and other operations, with the highest accuracy.
The production cycle is relatively long It is used for 3D rapid modeling, resource monitoring with high timeliness, and dynamic monitoring of large-scale engineering construction.
position relationship is more accurate, and the overall mosaic speed is fast. Through late-stage hyperspectral image analysis and other processing, it can be used to estimate the planting area of agricultural crops, growth monitoring, and pest and yield monitoring. Stitching panoramic images usually uses the classic SIFT (scale invariant feature transform) algorithm for image matching. In order to solve the interference of translation, rotation and scale change between the images to be stitched, the SIFT algorithm realizes the matching of the features of two different images by extracting the invariant feature points of the image, and forms a high quality panoramic image by matching a number of same name points to the UAV remote sensing image [10]. However, this algorithm has a large amount of computation, and it is impossible to accurately identify and extract feature points from images with less blurred edges or feature points, and it is not possible to clearly identify edges and outlines. At the same time, the extracted points, especially the corners of objects in the image, cannot be registered, and the accuracy of the extraction is not high. In addition, low-altitude remote sensing images are greatly influenced by the air flow, which tends to cause large tilt angles and irregular overlaps, resulting in large image distortion. In addition, the instability of the photographic posture results in the continuous splicing of cumulative errors, and in severe cases, the distortion of the local edges. In view of these characteristics, this paper adopts the image mosaic based on SURF features, and uses the UAV remote sensing image data to verify the algorithm. SURF algorithm is an improved algorithm based on SIFT algorithm. In order to achieve the purpose of acceleration, SURF algorithm uses Harr wavelet instead of Gauss filter to integrate the original image. By using Hessian matrix, the robustness of feature points is increased. So The SURF algorithm has greatly improved the speed and stability of feature points.
2 Method-image feature extraction and mosaic algorithm 2.1 SIFT algorithm SIFT algorithm is a feature matching algorithm proposed by Lowe D in 2004 [11]. It is based on the technology of invariant features and proposes a point feature registration algorithm that keeps invariant on the translation and rotation of images. A brief process of extracting feature points from images using SIFT algorithm is shown in Fig. 1.
1. Detect scale space extremes. That is, the Gaussian convolution kernel is used to transform the original image to obtain the representation sequence in multi-scale space, where the calculation formula is [12]: Among them, (x, y) is the gray value of the image, G(x, y, σ) is a scale-variable Gaussian function, which is defined as Gðx; y; σÞ ¼ 1 2πσ 2 e −ðx 2 þy 2 Þ 2σ 2 , and σ is a Gaussian scale factor. Finally, the spatial extremum is extracted from these sequences for feature extraction. 2. Locate and filter feature points. Fitting the Laplacian scale space function D(x, y, σ) in a two-dimensional continuous space. The Taylor formula is developed at the local extremum point (x0, y0, σ0), and the derivative of the expansion formula is equal to 0. The position coordinate x _ of the extremum point is obtained, substituting the coordinate position of the extreme point into the scale space function obtains the extremum formula of the original function: When the offset in any direction is greater than 1/2, it indicates that the center point of the interpolation has been shifted to the neighboring point, then the extreme point of the class is discarded. Finally, in order to enhance the anti-noise ability and improve the stability, the Hessian matrix is used to eliminate unstable edge response points.
3. Feature points are assigned to the main direction. After determining the feature points, the main direction needs to be determined for each feature point so that it has rotation invariance. The calculation formula of the main direction of feature points is [13] Where m(x,y) denotes the gradient modulus value of the feature point (x,y), θ(x,y) denotes the gradient direction of the feature point, and L(x, y) denotes the Gaussian image of the scale on which the feature point is located. For each feature point, a histogram can be used to count the gradient distribution of the pixel gray values in the neighborhood of the pixel center, that is, determine the main direction of the feature point.
In addition, in order to make up for the instability of the feature points without affine invariance, the gradient amplitude of each sampling point should be weighted for each gradient histogram, and the weight of each sampling point is finally determined by the gradient modulus value of the sampling point and the Gaussian weight. 4. A key point descriptor is constructed. Through the calculation of the above three steps, each feature point detected contains three messages ((x, y), σ, θ). That is, position, scale, and direction. Because the descriptor of the feature point is related to the scale of the feature point, the generation of the feature point descriptor needs to be carried out in the Gaussian pyramid space of the corresponding scale [14]. First, the neighborhood centered on the feature point is divided into B P × B P sub-blocks, and the edge size of each sub-block is mσ pixels. The construction process of feature point descriptors is as follows: first, taking the feature point as the center, the rotation angle of the image in the neighborhood of the ðmσðB p þ 1Þ ffiffi ffi 2 p Â mσðB p þ 1Þ ffiffi ffi 2 p Þ of the feature point is θ to the main direction of the feature point. Then, take the feature point as the center, select the mσB p × σB p size image block, and divide the interval into the B P × B P sub block, then use the gradient histogram to calculate the gradient accumulating value of all pixels in each sub block in eight directions, and form the seed point. A 128 dimensional feature vector is formed. In addition, in the process of constructing feature point descriptors, all the pixels in the neighborhood of the feature point need to be weighted by Gaussian, and all the pixels in the neighborhood range of the feature points need to be normalized two times in order to remove the influence of illumination and other factors [15].

Surf algorithm
The SURF feature based image square splicing is used in this paper. The basic process is image preprocessing, SURF feature matching, transformation parameter estimation, global splicing of L-M optimization, and image fusion.

Image preprocessing
Because of the uneven illumination of the original remote sensing image, the background gray level of the image is uneven. Therefore, in the binarization process of the image, in order to achieve the best effect, it is required to block the image. A threshold is determined by the valley on the gray histogram of the image, and then each block of the image is binarized.

SUFR feature matching
The SURF algorithm uses the idea of SIFT algorithm. In order to improve the computing speed of feature extraction and matching, the SURF algorithm uses the approximate method of integral image and box filter, and keeps the image scale and rotation invariance as well as the better diversity and robustness. SURF feature matching is mainly divided into two steps: SURF feature extraction and SURF feature matching.

1) SURF feature extraction
Construct scale space, extreme point detection. A convolution operation is performed on the integral image of the input image using an approximate Gaussian filter scaled up by layers to obtain a pyramid image. Calculate the Hessian matrix determinant and get the feature point response value. By non-maximal suppression, each pixel in the scale space is compared with the other 26 pixels in the same layer and adjacent layers of the pixel to obtain local maxima and minima points. The Taylor expansion of the three-dimensional quadratic equation is used for surface fitting to achieve accurate positioning of interest points. The main direction of the feature point is determined. The main direction of the feature points in the SURF algorithm is determined based on Haar wavelet response and other information in the neighborhood of the feature points. Feature description vector generation. On the scale image of the feature point, the orientation of the coordinate axis and the orientation of the feature point are adjusted to be the same, and then the feature point is taken as the center. Similar to SIFT method of constructing the feature description vector, a 64-dimensional feature vector is obtained.

2) SURF feature matching
Here, the Euclidean distance of the SURF feature vectors of the two feature points is used as feature matching similarity measure. The characteristic points P A and P B feature vectors are D vA and D vB , respectively. The distance between two points is defined as [16] The description of the process is as follows: the matching point search algorithm is used to find the feature points with the minimum and the second minimum distance to the match point. When the ratio of the minimum distance and the second minimum distance is smaller than the preset threshold, the matching is considered successful.

The transformation parameter estimation
The perspective transformation between images X(x, y, 1) T and X ' (x ' , y ' , 1) T is expressed as Since the extracted initial matching points generally have certain mismatched pairs, the RANSAC algorithm based on the perspective transformation constraints is required to first purify the matching pairs, and then the least squares method is used to estimate the transformation matrix parameters. The process is to first calculate the transformation matrix H linearly by randomly selecting six points from the initial point pair, and then calculate the distance between other point pairs. A value less than the threshold is defined as an "inner point," and a value greater than the threshold is defined as an "outside point." Repeating iterations, selecting the set with the most "inner point" is the correct point pair after purification.

Global splicing of L-M optimization
In order to solve the cumulative error caused by the influence of photographic attitude, terrain fluctuation, and other factors [17], the global L-M optimized global splicing strategy is adopted in this paper. Based on the idea of minimizing the variance of the mean square distance between the same name points, this method uses the dynamic adjustment of each image transformation parameter in the splicing process to reduce the cumulative error effect and achieve the goal of global optimization.
The principle of this optimization strategy is to first select the reference plane, read the single images in turn, and then optimize the transformation matrix of each image to the reference plane at the same time so that the error of each image transformation to the reference plane is the smallest. For feature pairs (x i ,x j ) with the same name, x i after the projection of the reference plane, the coordinate after projection to its adjacent image is x i ' , and the distance difference is as follows: Where H i is the transformation matrix of the feature point of the image I i projected onto the reference plane, H j −1 is the transformation matrix from the reference plane back to the image I J at this point.
According to the formula (6), summing up the distance difference of all other images that are overlapped with the current image I i , we get the overall optimization target equation [18]: Where n is the total image; F(i,j) is a set of matching pairs between image I i and image I j ; d k ij is the distance difference calculated from the K matching point between image I i and image I j ; M(i) is an image set that overlaps with I i ; f(x) is an error function, which is expressed as follows [19]: The optimal transformation matrix is iteratively determined using the L-M algorithm [20], the basic process is described as follows: 1) SURF matching between all images, obtaining matching pairs, establishing an image adjacency relationship table.
2) Automatically select the reference plane. This paper uses the image I i with the largest weight in the image sequence as the reference plane, the weight expression is [21]: Where N is the number of overlapped images with the same name point pair as image I i ; S is the area of the overlap area; n is the number of matching points for the overlapping area. The larger the T value, the larger the weight of the image I i in the video sequence, which is taken as the reference plane.
3) The continuous transformation of the adjacent image transformation matrix in the airstrip is carried out to establish the general registration relationship between the photos and the reference image, which is used as the iterative initial value of the next L-M algorithm.
4) The best adjacent image is added in sequence according to the image adjacency relation table. In order to minimize the error when each image is transformed to the reference plane, the L-M algorithm is used to optimize the transformation matrix of each image to the reference plane. 5) Continue to add images, optimize transformation matrix, until all images are added, and finally output results.

Image fusion
When the geometric relationship between images is determined, in order to keep the visual color consistent and maintain a smooth visual transition, this paper uses a weighted average fusion method based on Gaussian model to stack multiple images into a panoramic image.

Experimental data
UAV remote sensing uses a low-altitude UAV as a platform. Different instruments are used to carry out telemetry on the surface according to different tasks to obtain the necessary surface information. Commonly used instruments include synthetic aperture radar, scanners, and CCD digital. In general, the UAV equipped with a CCD digital camera can acquire surface images flexibly, and the image accuracy is high. However, since low-altitude operations cannot obtain panoramic images in the target area, the captured images need to be stitched to form a panoramic image. This paper uses a Canon camera equipped with a 24 mm fixed focus camera on a UAV to shoot a rural farmland image data. The weather is clear and calm. The UAV is flying smoothly, and the image obtained is clear. Figure 2 shows a group of adjacent images from remote sensing images of rice fields obtained by UAV. It can be seen from the figure that there are more overlapping areas between two adjacent image pairs.

Comparison of image feature point extraction results
This paper compares and analyzes the feature point matching process and the splicing process of SIFT algorithm and SURF algorithm for adjacent images to be stitched. Figure 3 shows the effect of detecting feature points based on the SIFT algorithm and detecting feature points based on the SURF algorithm on adjacent images. The upper part of Fig. 3 is the image detection effect based on SIFT algorithm for detecting feature points. The lower part of Fig. 3 is the image detection effect based on SURF algorithm for detecting feature points. The results show that the SIFT extracts 580 feature points and takes 621.81 ms. The SURF extracts 356 feature points and takes 256.5 ms. From the feature extraction image, we can see that the points extracted by the SURF algorithm are significantly fewer than those extracted by the SIFT algorithm, and the quality of the extracted feature points is relatively high. It can be seen that the use of the SURF algorithm can eliminate invalid feature points outside the overlap region to some extent.

Mosaic image
In this paper, L-M algorithm is used to mosaic the panoramic images of the adjacent images as shown in Fig. 4. It can be seen from the figure that the image stitching is basically seamless, but the edges are partially missing. But the whole fitting effect is good. Because the data with large errors are eliminated, it is possible to eliminate the matching points with the same name. So the stitching effect is better.

Conclusions
This paper introduces the technology of remote sensing image mosaic based on SIFT algorithm and optimizes the SIFT algorithm for the high resolution of the remote sensing image and too many detection feature points.
The SURF operator is introduced into the low-altitude remote sensing image matching process, and the image mosaic technology based on the SURF algorithm is proposed. The difference between the detection feature points of SIFT algorithm and SURF algorithm is compared and analyzed by experiments. The SURF algorithm can eliminate the invalid feature points outside the overlapped region to a certain extent. It shows that it can improve the matching speed of the feature points greatly while maintaining the higher registration accuracy of the  SIFT operator and has a strong performance in the low-altitude remote sensing image mosaic. In addition, a global mosaic strategy optimized by L-M is used to achieve panoramic mosaic of the survey area and better stitching results were obtained. Finally, the effectiveness of the SIFT algorithm and SURF algorithm for high resolution remote sensing image mosaic is compared.