### Image processing

The original photo captured by the navigation camera in this article is a color RGB image. The color model is a description of the different color systems developed in the field of image processing. The purposes of color modeling are to represent the colors according to a certain primary color and to establish a color model using a three-dimensional coordinate system, and the RGB model is based on a Cartesian coordinate system. The color subspace is shown in the cube in Fig. 1. At the same time, all color values are normalized and all RGB values are assumed to be in the range [0.1]. Figure 1 shows a common color matrix [16].

In the RGB color space, the kiwi leaves and ground weeds cannot be separated in the R, G, and B component images. At the same time, due to the scaffolding cultivation mode of kiwifruit trees, the orchard environment is heavily shadowed, so the grass, leaves, and trunks are mixed and cannot be separated in subsequent processing. In the HSV space, the boundary between the kiwifruit tree ridge and the grass on the ground and the leaves in the H component of the kiwi orchard image is clear, so it is easy to extract the tree row information and also suppress the influence of the shadow. The middle and upper branches of the S component are the same feature as the grassland, but the trunk and the ridge are both fuzzy, so it is not easy to extract the kiwifruit tree row. Meanwhile, the tree row information of the V component is closer to the ground than the ground. From the above analysis, it is known that the H channel grayscale image in the HSV space can effectively separate the ridges of the kiwifruit tree from the background, so this study converts RGB to HSV. The specific process is as follows:

There is a conversion relationship between HSV and RGB color space. The calculation formula of luminance component Y is shown in Eq. (1), and the calculation formula of saturation component S is Eq. (2) [17]:

$$ V=\max \left(R,G,B\right) $$

(1)

$$ S=\frac{V-\min \left(R,G,B\right)}{V} $$

(2)

The formula for calculating the hue component H can be expressed as:

$$ H=\left\{\begin{array}{c}\frac{G-B}{R-\min \left(R,G,B\right)}\times {60}^{{}^{\circ}},R=\max \left(R,G,B\right)\\ {}2+\frac{B-R}{G-\min \left(R,G,B\right)}\times {60}^{{}^{\circ}},G=\max \left(R,G,B\right)\\ {}4+\frac{R-G}{B-\min \left(R,G,B\right)}\times {60}^{{}^{\circ}},B=\max \left(R,G,B\right)\end{array}\right. $$

(3)

### Image enhancement

The original image captured by the vision system will contain a lot of noise, and the image will be distorted. Therefore, the image quality must be improved before the image processing analysis. The main purpose of image enhancement of kiwi orchard is to eliminate noise, improve contrast, and make the image more favorable for processing by the visual system. Airspace enhancement and frequency domain enhancement are the two major methods of image enhancement. This study mainly uses frequency domain enhancement to process images, and the original image of the kiwi orchard is filtered in the frequency domain of the image. According to signal analysis theory, frequency domain filtering technology mainly relies on Fourier transform and convolution theory. Assuming that the kiwi orchard image is *g*(*i*, *j*), the result of the convolution operation of the function *f*(*i*, *j*) with *h*(*i*, *j*) is the kiwi orchard image, as in Eqs. (4) and (5).

$$ g\left(i,j\right)=f\left(i,j\right)\bigotimes h\left(i,j\right) $$

(4)

$$ G\left(u,v\right)=F\left(u,v\right)H\left(u,v\right) $$

(5)

where *G*, *F*, and *H* are the Fourier transforms of the functions *g*(*i*, *j*), *f*(*i*, *j*), and *h*(*i*, *j*), respectively, and *H*(*u*, *v*) is the filter function, that is, the transfer function. The image filtering process can be divided into the following three steps: (1) The original kiwi orchard image *f*(*i*, *j*) is taken by Fourier transform to obtain *F*(*i*, *j*). (2) *F*(*i*, *j*) and *H*(*u*, *v*) are taken convolution to get *G*(*u*, *v*). (3) The inverse Fourier transform is performed on *G*(*u*, *v*), so that the enhanced image *g*(*i*, *j*) can be obtained.

### Filtering

The environment of the kiwifruit orchard in the scaffolding cultivation mode is complex, which makes the kiwi picking robot picking up serious environmental shadows. There are many ways to process image with shadows. Suitable for processing kiwifruit orchard images is a homomorphic filtering method, which has a good effect on processing high-light and reflective areas with strong illumination effects. This paper analyzes how to overcome the influence of light intensity changes on the processing of kiwifruit orchard images, so that the kiwifruit trunk can be effectively segmented as the target of recognition. In this paper, homomorphic filtering is used to enhance the image of kiwifruit orchard, highlight the trunk characteristics of kiwifruit, and reduce the influence of background noise on the recognition of kiwifruit trunk. Homomorphic filtering is a nonlinear system based on the generalized superposition principle and is a method of enhancing image contrast in the frequency domain. The image *f*(*x*, *y*) is represented by the illumination component *i*(*x*, *y*) and the reflection component *r*(*x*, *y*) as in Eq. (6).

$$ f\left(x,y\right)=i\left(x,y\right)\bullet r\left(\mathrm{x},\mathrm{y}\right) $$

(6)

The basic principles of homomorphic filtering are the product of illuminance and reflectance in the image is the pixel gray value, the low-frequency component of the image is illuminance, and the high-frequency component is reflectivity. The shaded area detail feature is described by processing the relationship between illuminance and reflectivity and pixel gray value. In this study, the homomorphic filtering is used to enhance the image, which can effectively improve the image segmentation effect and reduce the influence of shadow on the image. The homomorphic filter function in this study is based on the modification of the high-pass filter function and the homomorphic filter transfer function *H*(*i*, *j*) is given by Eq. (7):

$$ H\left(i,j\right)=\left( rh- rl\right)\times \left(-{e}^{-c{\left(\frac{D\left(i,j\right)}{d_0}\right)}^2}\right)+ rh $$

(7)

### Image segmentation

Image segmentation based on kiwifruit orchard and kiwi trunk is the important content of this paper, and the specific segmentation method is as follows. The Otsu algorithm was used to segment the black ridges of kiwifruit orchards in this study. For the kiwifruit orchard image, gray histogram has two peak images; this paper selects the H channel image above for processing.

$$ H\left(i,j\right)=\left( rh- rl\right)\times \left(-{e}^{-c{\left(\frac{D\left(i,j\right)}{d_0}\right)}^2}\right)+ rh $$

(8)

In the formula, *rh* > 1,\( rl<1;D\left(i,j\right)=\sqrt{{\left(i- nl\right)}^2+{\left(j-n2\right)}^2} \). *nl* and *n*2 are half of the image line number and column number (take integer), and *c* = 4, d_{0}=4.8. In this paper, the homomorphic filtering is used to extract the kiwifruit trunk. Simultaneously, according to the multiple test analysis, the above parameters are selected to enhance the homomorphic filtering of the kiwifruit orchard image, and the kiwifruit orchard image under different light intensities is selected to verify the homomorphic filtering effect. Combined with the pretreatment results of kiwifruit orchards, the Otsu algorithm was used to segment the black ridges of kiwifruit orchards in this study. The algorithm was proposed by Otsu in 1979, and it is considered to be one of the most widely used methods for automatically acquiring image thresholds, has an effective processing effect, and thus has been widely used. We assume that *T* is the segmentation threshold of the kiwifruit orchard image, the ratio of the kiwifruit trunk area to the total kiwifruit orchard image is *P*_{0}, the average gray value of the trunk area is *u*_{0}, and the other regions outside the kiwifruit trunk region are *u*_{1} and P_{1}, then the total average gray scale of the kiwi orchard image can be expressed as:

$$ {u}_r={P}_0\times {u}_0+{P}_1\times {u}_1 $$

(9)

When the threshold is *T*, the variance is obtained.

$$ {\sigma}^2=+{P}_0{\left({u}_0-{u}_r\right)}^2+{P}_1{\left({u}_1-{u}_r\right)}^2 $$

(10)

For an image with a distinct trough of the grayscale histogram, we first select an approximate threshold *N* (close to the value of the trough) to split the image into two parts, *X*_{1} and *X*_{2}. Thereafter, the gray value average values *ε*_{1} and *ε*_{2} of the two regions are calculated, and the new threshold value is selected as *N* = (*ε*_{1} + *ε*_{2})/2 and the above process is repeated until the two gray value values are the same.

In the kiwifruit orchard environment, the growth of kiwifruit trees is different, and the lighting conditions are complex. Therefore, the threshold segmentation cannot effectively separate the kiwi tree trunk information from other backgrounds of kiwifruit orchards. Based on this, this study proposes a region-based growth method for orchard image segmentation featuring kiwi trunks. The method performs RGB space and HSV spatial transformation on the kiwifruit orchard image and compares the difference between the path and the scene environment under different color channels. At the same time, this method takes the sample image of kiwifruit orchard as an example and separates the original color image by different color channels. In the RGB space, the gray histogram corresponding to the B component has peaks and valleys, but the gray value overlaps too much, and the background and target regions cannot be separated. In the HSV space, the H component has obvious peaks and valleys. At the same time, it can be seen from the H channel image that the ridge and the trunk can be separated, but the trunk and the ridge cannot be separated. Therefore, in the RGB and HSV three-channel kiwi orchard images, each channel could not effectively separate the kiwi tree trunk. In this paper, the kiwifruit orchard image with direct graying is selected, and the kiwifruit orchard image enhanced by homomorphic filtering in the third chapter is selected as the regional growth domain. According to the purpose of this study, the kiwifruit orchard image is homomorphically filtered, and the average gray value *P*_{i} of the kiwi trunk area of each image after filtering is calculated. The gray value of the random point is set to *g*_{i}, and the *n* number of points in the trunk area is randomly selected.

$$ {P}_i=\frac{1}{n}\sum \limits_{j=1}^n{g}_j $$

(11)

In order to reduce the error of the average gray value of the kiwi trunk region, the degree of deviation of the gray value of the trunk region in each image from the average gray value is calculated. The standard deviation (*σ*_{i}) is used to measure the degree of deviation of the gray value as follows:

$$ {\sigma}_i=\sqrt{\frac{1}{n}\sum \limits_{j=1}^n{\left({x}_j-{p}_i\right)}^2} $$

(12)

Among them, *σ*_{i} is the standard deviation of the gray value of the trunk region in the *i*th orchard image, and *p*_{i} represents the gray value of the pixel selected in the region.

After the homomorphic filtering of the kiwifruit orchard image, the gray value of the kiwifruit trunk area is almost 0, and due to the change of light, the gray value of each trunk area in each image is distributed between 10.35. Therefore, combined with the actual situation, the average gray value of the kiwi trunk area of the eight sample images selected by this study is 3.81, and the threshold is *T*′ = 30. After the seed point is selected, the determination of the growth criteria is very important in the next.

The pixels of the peach orchard image are judged by similarity. The growth criteria have the following main steps: (1) The kiwi orchard image after preprocessing is scanned to find pixels that have not yet belong. (2) Focusing on the scanned home pixel, its domain pixels are grown and checked according to the four neighborhoods (up, down, left, and right) of the point. If the grayscale differences are less than a predetermined threshold, they are merged. (3) We start growing from the selected seed point until the entire image is processed and the seed point cannot be found or until the area cannot be further expanded. (4) The above steps are repeated until all the pixels are assigned and the growth is completed.

### Fruit positioning

Before using the binocular stereo vision system for positioning, the camera must first be calibrated, and the camera calibration largely determines the accuracy of the three-dimensional information acquisition. In order to better understand the camera calibration principle, firstly, several commonly used coordinate systems and their mutual conversion relationship will be described. The image coordinate system represents the projection of the spatial object point on the image plane. The target fruit is undoubtedly the most relevant information between the images acquired by the kiwi picking robot. In this section, the target fruit information identified by the previous frame image is used to reduce the recognition time of the current image, and the frame-by-frame type is decremented. When a single robot picking robot performs fruit picking, it can only pick one by one, so when there are multiple fruits in the image, the target fruit to be picked must be determined. Therefore, the following methods can be used to find the center of the fruit. When the fruit is not occluded, the processed fruit segmentation image can be marked, and then the two-dimensional centroid coordinates are obtained by Formula (13) for the marker fruit region. At the same time, the side length is calculated, and finally, the target fruit is determined by Formula (14) based on the nearest principle of the distance image center.

$$ \left\{\begin{array}{c}x=\sum \limits_{i, j\epsilon \varOmega}\frac{1}{n}\\ {}y=\sum \limits_{i, j\epsilon \varOmega}\frac{j}{n}\end{array}\right. $$

(13)

In the formula, *i* and *j* represent the horizontal and vertical coordinates of the fruit image pixel, respectively; *n* represents the total number of pixels of the fruit image; and *Ω* represents the set of pixels belonging to the same fruit image.

$$ d=\sqrt{{\left({x}_0-{x}_c\right)}^2+{\left({y}_0-{y}_c\right)}^2} $$

(14)

In the formula, *x*_{0} and *y*_{0} indicate that the fruit centroid coordinates *x*_{c} and *y*_{c} represent the image center coordinates.