Open Access
14 September 2018 Real-time unmanned aerial vehicle tracking of fast moving small target on ground
Junhua Yan, Jun Du, Yong Young, Christopher R. Chatwin, Rupert C. D. Young, Philip Birch
Author Affiliations +
Funded by: National Natural Science Foundation of China (NSFC), National Natural Science Foundation of China, Aeronautical Science Foundation of China, Science and Technology on Avionics Integration Laboratory and Aeronautical Science Foundation of China, National Natural Science Foundation of Jiangsu Province, Natural Science Foundation of Jiangsu Province
Abstract
To solve the problems of occlusion and fast motion of small targets in unmanned aerial vehicle target tracking, an adaptive algorithm that fuses the improved color histogram tracking response and the correlation filter tracking response based on multichannel histogram of oriented gradient features is proposed to realize small target tracking with high accuracy. The state judgment index is used to determine whether the target is in a fast motion or an occlusion state. In the fast motion state, the search area is enlarged, and the color optimal model that suppresses the suspected area is used for rough detection. Then, redetection in the location of multiple peaks in the rough detection response is carried out using the correlation filter to accurately locate the target. In an occlusion state, the model stops updating, the search area is expanded, and the current color model is used for rough detection. Then, redetection in the place of multiple peaks in the rough detection response is carried out using the correlation filter to accurately locate the target. Experimental results show that the proposed method can track small targets accurately. The frame rate of the proposed method is 40.23 frames/s, indicating usable real-time performance.

1.

Introduction

During unmanned aerial vehicle (UAV) target tracking, the target is far away from the camera; hence, the target pixel size in the image (the number of pixels occupied by the target) is small. In addition, when the UAV moves swiftly, the camera is actively adjusted, and the target position shift between adjacent frames may exceed 20 pixels. In a target tracking review article with more than 1000 citations,1 the above two cases are classified as low resolution and fast motion, respectively. Low resolution can also cause the target to be blocked easily. These factors make it challenging to track the ground moving small target accurately and in real time.

When the target occupies a small number of pixels, limited feature information is obtained from target pixels. High-level features that are capable of more powerful feature expression are favored in such circumstances to ensure the robustness of the tracking method. Danelljan et al.2 effectively improved the tracking effect of the method in their paper3 using multichannel color features instead of grayscale features. However, a single color feature is not sufficient for capturing all illumination changes. Henriques et al.4 used the multichannel histogram of oriented gradient (HOG)5 to represent the target, which can well represent the local shape feature of the target. Hence, the tracking effect of the correlation filtering was significantly improved. However, trackers based on HOG often perform poorly when the target has movements or serious deformations. Experiments showed that the above two single feature models cannot favorably cope with small targets, resulting in target drifting. Danelljan et al.6 won the 2016 VOT-challenge with a comprehensive combination of the multichannel color feature, a well-trained convolution neural network (CNN) feature, and the HOG feature. However, due to the limited number of online training samples in the target tracking, the over-dimensioned feature vector easily leads to over fitting; this method requires the updating of more than 800,000 model parameters with each use, making it difficult to fulfill the real-time requirement of target tracking.

Expanding the search area to obtain a larger sampling area is one of the ways to deal with fast moving targets. However, the amount of computation is increased and the false alarm rate rises due to the introduction of objects similar to the target. To cope with fast movements of the target, Ma et al.7 introduced an online random fern classifier, which is similar to training learning detection,8 to redetect targets. However, the redetection module is based on the grayscale features, so it is difficult to achieve good redetection performance in a large area. Zhang et al.9 used multiple trackers as an expert group to conduct semisupervised loss judgment on the expert group’s tracking results to select the optimal tracking result and improve the reliability of the tracker. However, it is still difficult for the method to deal with disturbing objects in the search area based on a single grayscale feature. Additionally, each frame requires multiple tracking and detection, making it difficult to achieve real-time performance. Zhu et al.10 used edge boxes11 to obtain areas with more closed edge information as a global candidate area instead of using a local search area. However, when the target is small, its edge information is relatively limited, making it difficult for edge boxes to accurately locate the target. In addition, the edge box method requires sampling of a large number of areas to improve the probability that the target is detected, compromising real-time performance.

Small targets are easily obscured, which increases the difficulty of tracking. Jia12 used a local sparse representation of the target to cope with partial occlusions of the target. Zhao et al.13 used an innovative keypoint matching-based tracker to handle the partial occlusion problem, yet these two methods cannot cope with relatively large occlusion sizes. Also, the average frame rate of this method on the OTB20131 dataset is 8.5 frames/s, not satisfying the real-time requirement of target tracking. In addition, small targets that lack information are not suitable for local sparse representation. Kalal et al.8 introduced the online random fern classifier to redetect targets, but the redetection module is based on simple grayscale features, so it is difficult to obtain good redetection results. In the case that the target is completely obscured, Yan et al.14 used the Kalman filter method to estimate the target position to achieve the target tracking, though the position estimation method cannot accurately estimate how the target would separate from the occlusion.

In this paper, to solve the problems of fast motions and occlusions of the target in UAV target tracking, an adaptive algorithm that fuses the improved color histogram tracking response and the correlation filter tracking response based on multichannel HOG features is proposed to realize stronger feature expression for small targets. The state judgment index is used to determine whether the target is in a fast motion or an occlusion state. In the fast motion state, the search area is enlarged, and the color optimal model that suppresses the suspected area is used for rough detection. Then, redetection in the location of multiple peaks in the rough detection response is carried out using the correlation filter to accurately locate the target. In the occlusion state, the model stops updating, the search area is expanded, and the current color model is used for rough detection. Then, redetection in the location of multiple peaks in the rough detection response is carried out using the correlation filter to accurately locate the target. The block diagram of real-time UAV tracking of fast moving small target on ground is shown in Fig. 1.

Fig. 1

Block diagram of real time UAV tracking of fast moving small target on ground.

JEI_27_5_053010_f001.png

2.

Target Tracking Method by Fusing Two Tracking Models

The HOG is a statistical feature based on the local gradient direction, which cannot cope with target deformations well. The global color distribution of the target does not change greatly with target deformations. Therefore, the global color feature can better deal with target deformations. By contrast, the color feature cannot deal with illumination changes very well, whereas the HOG uses gamma correction to normalize the contrast of the original image and can better deal with illumination changes. The color feature and the HOG complement each other. Hence, a tracking model based on the fusion of these two features is expected to represent the small target more powerfully and track it more accurately. A flowchart of the proposed target tracking method by the fusion of two tracking models is shown in Fig. 2.

Fig. 2

Flowchart of target tracking method by fusing two tracking models.

JEI_27_5_053010_f002.png

2.1.

Tracking Response of the Correlation Filter Model Based on Local Multichannel HOG Features

The correlation filter tracking method based on multichannel local HOG features is divided into the training stage and the detection stage. In the training stage, the optimal correlation filter is obtained by training the sample set, and the optimal filter is updated according to the fast updating strategy. Multichannel local HOG features are extracted for each pixel in the local search area of the previous frame, which are then used to form a matrix, and the rows and columns of the matrix are cyclically shifted to obtain a training sample set. According to the characteristics of the circulant matrix, the discrete Fourier domain was used to solve the correlation filter instead of the Ridge regression to avoid matrix inversion, reducing the complexity of the algorithm by several orders of magnitude and achieving real-time performance.4 In the detection stage, multichannel local HOG features are extracted for each pixel in the local search area of the current frame, which are then used to form a matrix, and the rows and columns of the matrix are cyclically shifted to obtain a to-be-detected sample set. The correlation filter response score for each sample set is obtained according to the updated optimal filter, and the coordinates of the sample with the highest score are set as the center location.

2.1.1.

Characteristics of the circulant matrix

In this section, the one-dimensional (1-D) single channel signal, which is methodologically similar to the two-dimensional (2-D) multichannel signal, is used to describe the acceleration characteristic of the circulant matrix.4 Suppose that the 1-D single channel signal is represented by a vector of n×1, denoted as x=[x0,x1,x2,Kxn1], then the circulant X is obtained by cyclic shift C(X) of x, shown as

Eq. (1)

X=C(x)=[x0x1xqxn1]=[x0x1xn1xn1x0xn2xnqxnq+1xnq1x1x2x0].

Circulant X is the training sample set. Each row vector xq is a sample, and its corresponding label vector is y:

Eq. (2)

y=[y0,y1,y2,yq,yn1]T(yq={0xq  is negative sample1xq  is positive sample).

The goal of training is to find a function f(X)=wTX that minimizes the squared error between samples xq and their label value yq, as shown in Eq. (1). Here, λ is a regularization parameter that controls overfitting. Note that

Eq. (3)

minwq[f(xq)yq]2+λw2,
where w is the coefficient to be solved for. The linear regression least squares of w can be computed as follows:

Eq. (4)

w=(XHX+λI)1XHy,
where the superscript H is the conjugate transpose. Directly solving Eq. (4) involves matrix inversion, which demands a huge amount of computation, compromising the real-time performance of the tracking method.

To reduce computational complexity, Eq. (4) is transformed into the frequency domain. According to Ref. 15, the circulant matrix is diagonalized by the discrete Fourier transform matrix F:

Eq. (5)

X=C(x)=Fdiag(x^)FH,
where x^ represents the discrete Fourier transformation of x, x^=F(x). Here,

Eq. (6)

w=[Fdiag(x^¯)FHFdiag(x^)FH+λFdiag(δ)FH]1Fdiag(x^¯)FHy=[Fdiag(x^¯x^+λδ)FH]1Fdiag(x^¯)FHy=Fdiag(x^¯x^¯x^+λδ)FHy=C[F1(x^¯x^¯x^+λδ)]y.

In Eq. (6), diag(x^¯)diag(x^¯)=diag(x^¯x^); δ is an all-1 vector and is omitted in the following equations x^¯ represents the conjugation of x^.

Then, use the following according to the convolution property of the circulant matrix discussed in Ref. 16:

F[C(x)y]=F(x˜*y)=F(x)¯F(y),
where x˜ represents the reverse order of x. A Fourier transform is carried out on both sides of Eq. (6) to solve for w:

Eq. (7)

F(w)=F[F1(x^¯x^x^¯+λ)]¯F(y),

Eq. (8)

w=F1(x^y^x^¯x^+λ).

2.1.2.

Correlation filter tracking method based on multichannel HOG feature

Training stage: The local searching area Dt for training is selected by setting the center pixel Pt1 of the target tracking box of the previous frame It1 as the center of Dt. The width and height of Dt are W and H, respectively, which are twice the width and height of the target of the previous frame.17 The multichannel local HOG feature dhw,hN is extracted for each pixel (w,h) in Dt1, where N is the number of channels of the feature. A W×H matrix DHN is constructed using dhw,hN. Each element dhw,hN in the matrix is an N-dimensional vector. Training samples {DHw,hN|w{0,1,,W1},h{0,1,,H1}} are generated by a cyclic shift operation on DHN. Training samples are used to train the optimal correlation filter hcfN so that it has the highest filtering response to the sample centered on (w,h) in Dt1. The training process is a ridge regression process. Its purpose is to minimize the loss, as shown in Eq. (9):

Eq. (9)

argminhcfNW,H(n=1Nhcfn*DHw,hngw,h2+λn=1Nhcfn2),
where * represents the convolution operation, DHw,hn(n=1,2,N) is the component of the sample in each channel, and hcfn(n=1,2,N) is the component of the correlation filter hcfN on each channel. Here, gw,h is the ideal 2-D Gaussian response corresponding to DHw,hN. n=1Nhcfn*DHw,hngw,h2 represents the loss function, and λn=1Nhcfn2 is the regular item to prevent overfitting, which must be >0. Note that λ is a regularization parameter and is assigned the optimal value 0.001 derived in Ref. 4. The idea discussed in Sec. 2.1.1 is applied to solve Eq. (9), and the optimal correlation filter of the previous frame in the frequency domain is obtained:

Eq. (10)

h^cfn=g^DH^nn=1N(DH^n¯DH^n)+λ=AnB+λ.

The optimal filter is updated according to the fast updating strategy proposed in Ref. 16:

Eq. (11)

Atn=(1η)At1n+ηg^¯t1DH^t1n,

Eq. (12)

Bt=(1η)Bt1+ηi=1NDH^t1n¯DH^t1n,
where η is an update parameter, which determines the update rate. A larger η means a greater impact of the current frame on the module, indicating faster model update. In this paper, η is assigned the optimal value of 0.01 derived in Ref. 16.

Detection stage: The local searching area Dt for training is selected by setting the center pixel Pt1 of the target tracking box of the previous frame It1 as the center of Dt. The multichannel local HOG feature dhw,hN is extracted for each pixel (w,h) in Dt1, where N is the number of channels of the feature. A W×H matrix DHN is constructed using dhw,hN. Each element dhw,hN in the matrix is an N-dimensional vector. Detecting samples {DHw,hN|w{0,1,,W1},h{0,1,,H1}} are generated by a cyclic shift operation on DHN. According to the updated optimal filter, the correlation filtering response score of each sample is obtained as follows:

Eq. (13)

Scf(w,h)=F1{n=1NAtnDH^w,hnBt+λ}.

The position of the target center pixel x in Dt is set to be the coordinates of the point (wmax,hmax) with the highest response score, and the correlation filtering tracking response score is Scf(x).

2.2.

Target Tracking Response Based on Improved Global Color Feature

In the color histogram tracking, the probability of the pixel x belonging to the target in the current local search area Dt is obtained by constructing the target normalized RGB color histogram and looking up the table. According to the normalized color histogram Histfg of the foreground and the normalized color histogram Histbg of the background of the current frame, the probability pfg(x) that the pixel x belongs to the foreground and the probability pbg(x) that the pixel x belongs to the background are, respectively, calculated.

Eq. (14)

pfg(x)=Histfg(ix),

Eq. (15)

pbg(x)=Histbg(ix),
where ix indicates that the pixel x belongs to the i’th bin in the color histogram.

According to Ref. 18, the probability that pixel x belongs to the target in the search area is denoted as

Eq. (16)

p(xO|Dt)=pfg(x)pfg(x)+pbg(x).

To adapt the representation to changing object appearance and illumination conditions, we update the object model on a regular basis using linear interpolation Pt(xO|Dt)=ηcP(xO|Dt)+(1ηc)Pt1(xO|Dt), with a learning rate ηc.

The probability integral graph I in the search area Dt is calculated, and the response score Shist(x) of the target box in Dt with pixel x as the center and the target size area as the size of the box is obtained:

Eq. (17)

Shist(x)=I(i+W/2,j+H/2)+I(i,j)I(i+W/2,j)I(i,j+H/2),
where W and H are the width and height of the current target, respectively, and (i,j) represents the horizontal and vertical coordinates of the pixel x.

The position of the target center pixel x in Dt is set to be the coordinates of the point (imax,jmax) with the highest response score, and the color tracking response score is Shist(x).

If the target is relatively small, drifting to areas with a similar color is likely to happen. To cope with drifting, the current method suppresses areas with suspected color similarities to reduce interference from these areas.

When the response score Shist(x) of the box area satisfies Eq. (18), it is considered to be a suspected area:

Eq. (18)

Shist(xdis)θ0max[Shist(x)],θ0[0,1],
where xdis represents the central position of the suspected rectangular area and θ0 is the threshold parameter, which is arbitrarily set to be 0.8 here.

The suspected area is sorted according to its response score. The normalized color histogram set {Histdisn|n=1N} for the first N suspect areas is calculated. Then, the probability that pixel x belongs to each suspected area is calculated, followed by recalculation of the probability that pixel x in Dt belongs to the target as shown in Eq. (19):

Eq. (19)

Pt(xO|Dt)=pfg(x)pfg(x)+pbg(x)+1Nn=1Npdisn(x).

Then, the color tracking response score Shist(x) of the target tracking box in the search area Dt is recalculated using Eq. (17).

To test whether the suppression of color suspicious areas can effectively reduce interference from suspected areas, a comparative experiment on images with small targets and color-like areas in the UAV123 dataset is carried out. Some experimental results are shown in Fig. 3.

Fig. 3

Color tracking responses with and without color suspicious area suppression: (a) original image (the target is within the black box), (b) probability map of pixels belonging to the target without suppression, and (c) probability map of pixels belonging to the target with suppression.

JEI_27_5_053010_f003.png

The second column in Fig. 3 is the probability map of pixels belonging to the target without suppression. Probability values at the target area are high, yet those of color suspicious areas are also high, causing interference to target tracking. The third column in Fig. 3 is the probability map of pixels belonging to the target with suppression. Responses in suspected areas are suppressed. The decrease in probability values at the target area in the map is less than that at suspected areas, which makes the probability value of the target more prominent. Thus, experiments show that suppression of color-like areas can effectively reduce interference from suspected areas.

2.3.

Fusion Dual Model Tracking Response

Adaptive fusion of the improved color histogram tracking response and the correlation filter tracking response based on multichannel HOG feature was carried out to determine the center position Pt of the target tracking box in the current frame It:

Eq. (20)

Pt=argmaxxDtSf(x),

Eq. (21)

Sf(x)=Scf(x)+f[Shist(x)],
where Sf(x) is the tracking response score at x of the fusion dual model. When there are many suspected areas, the target tracking box, which is determined by the target tracking response based on improved global color feature, is likely to drift to areas with a similar color. To guarantee the exact location of the target tracking box, which is determined by the fusion dual model tracking response, we need to reduce the color tracking response score. Therefore, the value of the score is reduced to lower the impact of the color tracking response on the overall fusion probability. Hence, the color tracking response score Shist(x) is adaptively adjusted as follows:

Eq. (22)

f[Shist(x)]={[Shist(x)]1/2,N=1Shist(x),1<N2[Shist(x)](1+N/2),2<N5,
where N is the number of suspected areas. The color tracking response score is affected by the number of suspected areas. When N is large, it is considered that the color tracking response score is not credible enough. Hence, the value of the score is reduced to lower the impact of the color tracking response on the overall fusion probability. On the contrary, if N is small, indicating that the color tracking response score is credible, it is appropriate to increase its value for better tracking results. The aim of these modifications is to achieve better tracking results.

Target tracking results of the adaptive fusion dual model are compared with target tracking results of single models as shown in Fig. 4. In each image in Fig. 4, the green box demonstrates the tracking result of the correlation filter model based on multichannel HOG features. The yellow box demonstrates the tracking result of the improved global color histogram feature model. The red box demonstrates the tracking result of the proposed adaptive fusion dual model. The black box marks the true location of the target.

Fig. 4

Comparison of target tracking results of the adaptive fusion dual model and two single models.

JEI_27_5_053010_f004.png

The overlap score (OS) is used to measure the accuracy of target tracking results. OS is calculated as follows:

Eq. (23)

OS=BgtBtBgtBt,
where Bgt represents the true target position and Bt represents the target location identified by the tracking method. Higher OS scores indicate higher accuracy.

Figure 4 and Table 1 show that the proposed target tracking method that fuses multiple tracking models can achieve better OS, indicating higher tracking accuracy.

Table 1

Comparison of OS scores of target tracking results of the adaptive fusion dual model and two single models.

Tracking methodCorrelation filter trackingColor trackingFusion two model tracking
OS of the first image0.4690.7460.750
OS of the second image0.4310.3530.508
OS of the third image0.2380.1570.395
OS of the fourth image0.7100.6660.721

3.

Fast Moving or Occluded Target Tracking

In the process of target tracking, rapid movement of the target leads to rapid location changes of the target in the video. Consequently, the target easily moves out of the local search area, resulting in tracking failure. In addition, the small size of the target makes it an easy victim of occlusion. In this paper, a tracking method that copes with target fast motions and target occlusions is proposed.

3.1.

Target Tracking under Target Fast Motion

In a target tracking review article with >1000 citations,1 the two cases described above are classified as fast motion and low resolution. A target is considered to be in the fast motion state when its position offsets >20 pixels between adjacent frames.

Eq. (24)

(it1it2)>20or(jt1jt2)>20,
(it1,jt1) and (it2,jt2) are the coordinates of the centers Pt1 and Pt2, respectively, of the tracking box in frame t1 and the frame t2, respectively.

3.1.1.

Update the model parameters of the correlation filter model and color feature model

When the target is in the fast motion state, model parameters of the correlation filter model and the color histogram model need to be adjusted to cope with changes in target posture and illumination.

The correlation filter model parameters An and B, the foreground histogram feature Histfg, and the fusion response peak value max(Sf) of the frame, which is 2FR (FR indicates video frame rate) frames before the current frame, form a set which is then divided into L segments in time order Aln, Bl, and Histfg(l) corresponding to the frame with the largest response peak value from each segment; these are selected to form an expert group [Aln, Bl, Histfg(l)]. Weighted summation was performed on the members of the expert group to obtain the optimal correlation filter model and the foreground histogram feature needed in color tracking. Taking into account the temporal correlation between frames in a video sequence, greater weights are assigned to parameters in frames that are closer to the current frame:

Eq. (25)

An=2L(L+1)l=1Ll×Aln,

Eq. (26)

B=2L(L+1)l=1Ll×Bl,

Eq. (27)

Histfg=2L(L+1)l=1LL×Histfg(l).

3.1.2.

Redetect to find the true target

To track the fast moving target in real time, the local search area in color tracking is expanded to 2Dt, and color primary detection is performed to obtain the color tracking response score Shist(x).

In color tracking, the true target area can be mistaken as a suspected area and, hence, be suppressed. Also, when the true target area and the suspected object are close, the suspected object can be omitted because it yields a subpeak response closed to the peak response area. In either case, multiple peak areas appear in the final color tracking response, and the highest peak response does not necessarily represent the true target. For instance, as shown in Fig. 5, in the upper left image, the red box marks the true target, whereas the green box is the suspected object, and the response value of the suspected object in the color tracking response is higher than that of the true target.

Fig. 5

Schematic of multipeak redetection.

JEI_27_5_053010_f005.png

To accurately track the true target, this paper uses the optimal correlation filter obtained in Sec. 3.1.1 to redetect the multipeak position in the color tracking response to determine the true target. The multipeak position {xpn|n1,,N1} of the color tracking response is determined first, where N1 is the number of peak positions. The multipeak position is determined using Eq. (28):

Eq. (28)

Shist(xp)>θ1max[Shist(x)],
where θ1 is assigned 0.8. Then, the local search area {Dtn|n1,,N1} is selected in xpn. The optimal correlation filter is used to redetect the local area to obtain the multipeak redetection response set {Scfn(x)|n1,,N1}, where the peak position of max[Scfn(x)] is the target center position Pt.

3.2.

Target Tracking in Occlusion

3.2.1.

Judge the degree of occlusion of the target

Fusion dual model target tracking response [Sf(x)] is shown in Fig. 6. Original images are shown in the first row in Fig. 6. The target is a pedestrian and is obscured by the car during tracking. The fusion tracking response maps [Sf(x)] corresponding to the local search area are shown in the second row. In the first column of Fig. 6, the target is not occluded and there is only a single peak corresponding to the target in the tracking response map. Except for the sharp peak at the center of the target, the rest of the map is relatively smooth. In the second column, the target is partially occluded, and there are many peaks in the tracking response map, with no single maximum peak value and with large fluctuations. In the third column, the target is more occluded, and an additional peak appears in the tracking response map, with an even larger overall fluctuation.

Fig. 6

Fusion of dual model tracking result and tracking response map: (a) OCC = 3.0472, (b) OCC = 2.2965, and (c) OCC = 2.2811.

JEI_27_5_053010_f006.png

To cope with this problem, an indicator OCC for judging the degree of occlusion of the target is proposed:

Eq. (29)

OCC=max[Sf(x)]min[Sf(x)]mean{W,H|Sf(x)min[Sf(x)]|},
where W and H represent the width and the height, respectively, of the response map corresponding to the local search area. This indicator reflects the degree of smoothness of the response map and the confidence level that the peak is in the center of the target.

In the process of target tracking, the value of OCC, which is used to judge the degree of occlusion of the target, is shown in the third row of Fig. 6. The first column has the largest OCC value. The second column has a smaller OCC value, and the third column has an even smaller OCC value. Figure 5 shows that the OCC value can be used to judge the degree of occlusion of the target.

In this paper, when the OCC value of the It frame is less than β times the OCC value of the It1 frame, the target is considered occluded. Here, β is assigned 0.8.

3.2.2.

Redetect the occluded target

A small target is easily obscured. When it is judged that the target is occluded by the occlusion indicator OCC, the local search area is expanded to 2Dt. First, the color primary detection is performed to determine the multipeak position {xpn|n1,,N1}, where N2 is the number of peak positions. The multipeak position is determined by Eq. (30):

Eq. (30)

Shist(xp)>θ2max[Shist(x)],
where θ2 is assigned a value of 0.7.

Then, the local search area {Dtn|n1,,N1} is selected in xpn. The preocclusion correlation filter is used to redetect the local area to obtain the multipeak redetection response set {Scfn(x)|n1,,N1}, where the peak position of max[Scfn(x)] is the target center position Pt.

4.

Flowchart of the Proposed Method

The flowchart of the proposed real-time UAV tracking of a fast moving small target on the ground is shown in Fig. 7.

Fig. 7

Flowchart of real time UAV tracking of a fast moving small target on ground.

JEI_27_5_053010_f007.png

In the target tracking process, first, the center of the local search area Dt in the current frame It is set to be the center position Pt1 of the target tracking result of the It1 frame. The correlation filter tracking response Scf(x) and the color tracking response Shist(x) of the pixel x in Dt are calculated, and the score Sf(x) is obtained by adaptively combining the two responses. The peak position of Sf(x) is the same as the target center position Pt of the It frame. In the fast motion state, the proposed method uses the optimal correlation filter to redetect the multipeak position of the color tracking response to determine the true target. The optimal correlation filter is used to redetect the local area to obtain the multipeak redetection response set {Scfn(x)|n1,,N1}. The peak position of max[Scfn(x)] is the target center position Pt. In the occlusion state, the color primary detection is performed. The pre-occlusion correlation filter is used to redetect the local area to obtain the multipeak redetection response set {Scfn(x)|n1,,N1} in the multipeak position. The peak position of max[Scfn(x)] is the target center position Pt.

5.

Experimental Results and Analysis

A set of video sequences containing small targets, fast motion, and occlusion characteristics from the database UAV123 are selected for the experiment, including a total of 15 groups and 6611 images.19 The image size is 1280×720  pixels. The tracking targets include people, cars, and other objects. All targets have fine manual annotation. The proposed method of this paper is compared with eight other state-of-the-art methods, including the CN tracker that uses color attributes as effective features,2 the KCF tracker that uses the multichannel HOG feature,4 the DSST tracker that relieves the scaling issue using the feature pyramid and the three-dimensional (3-D) correlation filter,16 the LCT tracker that uses the online random fern classifier as the redetection component for long-term tracking,7 the DAT tracker that uses the color histogram feature and suppresses the background area,20 and the Staple tracker that fuses the color tracker and correlation filter tracker linearly.18 The above-mentioned six methods have outstanding tracking results, and the speed of tracking meets the real-time requirement. Also, the MEEM8 tracker that uses the multiple tracker expert group to realize fast tracking, and the 2016 VOT Challenge champion CCOT6 that uses the feature of deep convolution neural network are included in the comparison experiments.

All experimental results and related performance evaluations were obtained using the same data and initialization conditions. Experimental environment: Matlab2016; experimental platform: 3.60 GHz, Intel i7 CPU, 64-bit win7 operating system, with 8GB of memory.

Datasets come from Ref. 21.

5.1.

Comparison of Tracking Results for the Different Methods

Tracking performance of the proposed method and CN, KCF, DSST, DAT, LCT, Staple, MEEM, and C-COT are compared in the video sequence set in which the target is small, fast moving, and occluded as shown in Fig. 8. The white box is the location of the true value of the target, which is used to compare with the tracking position obtained by the algorithm.

Fig. 8

Comparison of tracking results of the proposed method and state state-of-the-art tracking methods: (a) Bike2, (b) Truck4, (c) Car11, (d) Bike3, (e) Wakeboard5, (f) Car14, (g) Car13, and (h) Truck3.

JEI_27_5_053010_f008.png

The targets in the first and second rows of Bike2 and Truck4 were <200  pixels in size, and the target in the third row in Car11 were <100  pixels in size. Experimental results show that the tracking boxes of the other eight methods easily lose the target or drift to suspected objects. The proposed method is able to better characterize small targets with little feature information because of the adaptive fusion of multifeature models, improving the success rate of target tracking. In the third and the fourth rows of Bike3 and Car11, the targets are blocked and the other eight methods were unable to deal with occlusion, resulting in failure in tracking. The proposed method efficiently judges whether the target is occluded and initiates the corresponding tracking method when occlusion is detected, ensuring successful target tracking. In the fifth and the sixth rows of Wakeboard5 and Car14, the between-frame target position distance is >20  pixels. The other eight methods lost the target under this situation. The proposed method efficiently judges whether the target is in fast motion and initiates the corresponding tracking method when fast motion is detected, ensuring successful target tracking. In the seventh and the eighth rows of Car13 and Truck3, there are strong interfering objects near the true target, and most of the eight methods failed to track. The proposed method suppressed the suspected areas effectively and greatly reduced the interference from the suspected areas. It can resist the impact of strong interfering objects on small targets and track the target successfully.

5.2.

Performance Comparison Experiment

5.2.1.

Experiment of overlap success rate

If the overlap score of the tracking result of the It frame is beyond a given threshold, it is considered that the proposed method has successfully tracked the target in the It frame. The overlap success rate1 is the ratio of the number of successful tracking frames to the total number of frames. The overlap score is defined in Eq. (23).

The comparison of the overlap success rate of the proposed method with that of the other eight methods is shown in Fig. 9. In this paper, the area under curve (AUC) of the overlap success rate curve was used to evaluate the performance of the tracking methods because it is considered a more accurate evaluation of the overall tracking performance. The AUC values of all methods tested are listed after each method name in the figure legend of Fig. 9.

Fig. 9

Comparison of the overlap success rate of the proposed method with the other eight methods.

JEI_27_5_053010_f009.png

As shown in Fig. 9, the proposed method in this paper has the highest AUC, indicating that the performance of the proposed method has a high overlap success rate. When the overlap threshold is <0.5, the overlap success rate of the proposed method is substantially higher than that of the other methods. However, the success rate of the proposed method is slightly lower than that of the CCOT when the overlap threshold is high. This is because the proposed method mainly aims at small targets; hence, it does not adopt a complex scale adaptive strategy.

5.2.2.

Experiment of distance precision rate

If the Euclidean distance between the center of the It frame tracking result and the given target center is within a given location error threshold, it is considered that the proposed method has tracked the target precisely in the It frame. The distance precision rate is the ratio of the number of precise tracking frames to the total number of frames. The comparison of the distance precision rate of the proposed method with that of the other eight methods is shown in Fig. 10. The horizontal axis denotes the location error threshold and the vertical axis denotes the distance precision rate. The AUC value is again used as the evaluation index because it more accurately evaluates the overall performance of the methods. The AUC values of all methods tested are listed after each method name in the figure legend of Fig. 10.

Fig. 10

Comparison of the distance precision rate of the proposed method with the other eight methods.

JEI_27_5_053010_f010.png

As shown in Fig. 10, the proposed method in this paper has the highest AUC, indicating that the performance of the proposed method has a high distance precision rate. When the location error threshold is <5, the distance precision rate of the proposed method is substantially higher than that of the other methods. However, the success rate of the proposed method is slightly lower than that of the CCOT when the location error threshold is small. This is because the proposed method mainly aims at small targets; hence, it does not adopt a complex scale adaptive strategy.

5.2.3.

Experiment of average center location error

The center location error is the average Euclidean distance between the center of the tracking result and the given target center. Table 2 shows the center location error of the proposed method and the other eight methods.

Table 2

Comparison of the center location error of the proposed method with that of the other eight methods.

TrackerProposedCCOTMEEMSTAPLEDATKCFDSSTCNLCT
ACLE7.2725.8229.3963.1870.5879.4388.3289.75138.80

Table 2 shows that the average center location error of the proposed method is much smaller than that of the other eight methods. It shows that the tracking performance of this method is better than that of the other methods.

5.2.4.

Comparison of the real-time performance between methods

The proposed method is compared with the other eight methods for real-time performance. The frames per second (fps) is used to evaluate real-time performance. The fps of each method is shown in Table 3.

Table 3

Comparison of the frames per second of the proposed method with that of the other eight methods.

TrackerProposedCCOTMEEMSTAPLEDATKCFDSSTCNLCT
Fps40.232.586.1766.4720.33142.6232.6287.7923.11

According to Table 3, when compared with CCOT, MEEM, DAT, DSST, and LCT, the proposed method has higher fps, indicating that the proposed method has better real-time performance. The fps of methods STAPLE, KCF, and CN are higher than the proposed method; however, the proposed method performance is superior to them due to its multifeature model and strategies for coping with small target fast motion and occlusion.

6.

Conclusion

An adaptive algorithm that fuses the improved color histogram tracking response and the correlation filter tracking response based on multichannel HOG features is proposed to realize small target tracking with high accuracy. The state judgment index is used to determine whether the target is in fast motion or an occlusion state. In the fast motion state, the search area is enlarged, and the color optimal model that suppresses the suspected area is used for rough detection. Then, redetection in the place of multiple peaks in the rough detection response is carried out using the correlation filter to accurately locate the target. In the occlusion state, the model stops updating, the search area is expanded, and the current color model is used for rough detection. Then, redetection in the place of multiple peaks in the rough detection response is carried out using the correlation filter to accurately locate the target. The proposed method of this paper is compared with the other state-of-the-art methods using the UAV123 dataset. Experimental results show that the proposed method can accurately track a fast moving small target in real time. The fps of the proposed method is 40.23 indicating good real-time performance. In this paper, single target tracking is studied. In future research, multitarget tracking will be studied. Based on multitarget time-domain information and airspace information, an accurate real-time tracking method for UAV multitarget tracking will be developed.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (61471194 and 61705104), Science and Technology on Avionics Integration Laboratory and Aeronautical Science Foundation of China (20155552050), and the Natural Science Foundation of Jiangsu Province (BK20170804).

References

1. 

Y. Wu, J. Lim and M. H. Yang, “Online object tracking: a benchmark,” in IEEE Conf. on Computer Vision and Pattern Recognition, 2411 –2418 (2014). https://doi.org/10.1109/CVPR.2013.312 Google Scholar

2. 

M. Danelljan et al., “Adaptive color attributes for real-time visual tracking,” in IEEE Conf. on Computer Vision and Pattern Recognition, 1090 –1097 (2014). https://doi.org/10.1109/CVPR.2014.143 Google Scholar

3. 

J. F. Henriques et al., “Exploiting the circulant structure of tracking-by-detection with kernels,” in European Conf. on Computer Vision, 702 –715 (2012). Google Scholar

4. 

J. F. Henriques et al., “High-speed tracking with kernelized correlation filters,” IEEE Trans. Pattern Anal. Mach. Intell., 37 (3), 583 –596 (2015). https://doi.org/10.1109/TPAMI.2014.2345390 ITPIDJ 0162-8828 Google Scholar

5. 

N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, 886 –893 (2015). https://doi.org/10.1109/CVPR.2005.177 Google Scholar

6. 

M. Danelljan et al., “Beyond correlation filters: learning continuous convolution operators for visual tracking,” in European Conf. on Computer Vision, 472 –488 (2016). Google Scholar

7. 

C. Ma et al., “Long-term correlation tracking,” in IEEE Conf. on Computer Vision and Pattern Recognition, 5388 –5396 (2015). https://doi.org/10.1109/CVPR.2015.7299177 Google Scholar

8. 

Z. Kalal, K. Mikolajczyk and J. Matas, “Tracking-learning-detection,” IEEE Trans. Pattern Anal. Mach. Intell., 34 (7), 1409 –1422 (2012). https://doi.org/10.1109/TPAMI.2011.239 ITPIDJ 0162-8828 Google Scholar

9. 

J. Zhang, S. Ma and S. Sclaroff, “MEEM: robust tracking via multiple experts using entropy minimization,” Lect. Notes Comput. Sci., 8694 188 –203 (2014). https://doi.org/10.1007/978-3-319-10599-4 LNCSD9 0302-9743 Google Scholar

10. 

G. Zhu, F. Porikli and H. Li, “Beyond local search: tracking objects everywhere with instance-specific proposals,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 943 –951 (2016). https://doi.org/10.1109/CVPR.2016.108 Google Scholar

11. 

C. L. Zitnick and P. Dollár, “Edge boxes: locating object proposals from edges,” Lect. Notes Comput. Sci., 8693 391 –405 (2014). https://doi.org/10.1007/978-3-319-10602-1 LNCSD9 0302-9743 Google Scholar

12. 

X. Jia, “Visual tracking via adaptive structural local sparse appearance model,” in IEEE Conf. on Computer Vision and Pattern Recognition, 1822 –1829 (2012). https://doi.org/10.1109/CVPR.2012.6247880 Google Scholar

13. 

Q. Zhao et al., “Object tracking via kernel-based forward-backward keypoint matching,” Proc. SPIE, 10225 1022504 (2017). https://doi.org/10.1117/12.2266440 PSISDG 0277-786X Google Scholar

14. 

J. H. Yan et al., “Target tracking with improved CAMShift based on Kalman predictor,” J. Chin. Inertial Technol., 22 (4), 537 –542 (2014). https://doi.org/10.13695/j.cnki.12-1222/o3.2014.04.021 Google Scholar

15. 

R. M. Gray, “Toeplitz and circulant matrices: a review,” Found. Trends Commun. Inf. Theory, 2 (3), 155 –239 (2005). https://doi.org/10.1561/0100000006 Google Scholar

16. 

M. Danelljan et al., “Discriminative scale space tracking,” IEEE Trans. Pattern Anal. Mach. Intell., 39 1561 –1575 (2017). https://doi.org/10.1109/TPAMI.2016.2609928 ITPIDJ 0162-8828 Google Scholar

17. 

H. K. Galoogahi, T. Sim and S. Lucey, “Multi-channel correlation filters,” in IEEE Int. Conf. on Computer Vision, 3072 –3079 (2014). https://doi.org/10.1109/ICCV.2013.381 Google Scholar

18. 

L. Bertinetto et al., “Staple: complementary learners for real-time tracking,” in IEEE Conf. on Computer Vision and Pattern Recognition, (2016). https://doi.org/10.1109/CVPR.2016.156 Google Scholar

19. 

M. Mueller, N. Smith and B. Ghanem, “A benchmark and simulator for UAV tracking,” Lect. Notes Comput. Sci., 9905 445 –461 (2016). https://doi.org/10.1007/978-3-319-46448-0 LNCSD9 0302-9743 Google Scholar

20. 

H. Possegger, T. Mauthner and H. Bischof, “In defense of color-based model-free tracking,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2113 –2120 (2015). https://doi.org/10.1109/CVPR.2015.7298823 Google Scholar

Biography

Junhua Yan is a professor at Nanjing University of Aeronautics and Astronautics. She has been an academic visitor at University of Sussex (October 31, 2016–October 30, 2017). She received her BSc and MSc degrees and PhD from Nanjing University of Aeronautics and Astronautics in 1993, 2001, and 2004, respectively. She is the author of more than 40 journal papers and has 5 patents. Her current research interests include image quality assessment, multisource information fusion, target detection, tracking, and recognition.

Jun Du is a graduate student at Nanjing University of Aeronautics and Astronautics. Her research interest is target tracking. She received her BSc degree from Nanjing University of Aeronautics and Astronautics in 2016.

Yong Young received his BSc degree and MSc degree from Nanjing University of Aeronautics and Astronautics in 2015 and 2018, respectively. His research interest is target tracking. Currently, he is working at HIKVISION in China.

Christopher R. Chatwin is a director of the IISP research group, he has published a total of 220 journal papers, 230 conference papers, 10 book chapters, 2 books, and 112 technical reports. His patents (PCT-PN-WO03/073366, US10/504771, JPN2003-571984, EU03708323.5, and UK2389901) form the basis for new patent application in labelling technology for brand protection. He is a course convenor for the new MSc in security technologies and systems. He is a PhD external-examiner at the universities of Cambridge, Hull, Glasgow, Liverpool, Cairo, Singapore, Ghent, and Lulea.

Rupert C. D. Young is a reader and the head of the department. He has published a total of 120 journal papers and 133 conference papers. He is an editor of the Asian Journal of Physics. He is in the organizing committee and has chaired sessions for the Optical Pattern Recognition conference over the last 10 years, as a part of the annual SPIE Defense and Security Symposium, Orlando, Florida. He is a member of the OSA and SPIE.

Philip Birch is a senior lecturer. He has published a total of 70 journal papers and 80 conference papers, and has made major contributions to the research group in optoelectronics and image processing. He left for 3 years to work as a project manager in a start-up company called Spiral Scratch Ltd. He also acted as the Liverpool University KTP facilitator and developed links with Sheffield Hallam University.

CC BY: © The Authors. Published by SPIE under a Creative Commons Attribution 4.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.
Junhua Yan, Jun Du, Yong Young, Christopher R. Chatwin, Rupert C. D. Young, and Philip Birch "Real-time unmanned aerial vehicle tracking of fast moving small target on ground," Journal of Electronic Imaging 27(5), 053010 (14 September 2018). https://doi.org/10.1117/1.JEI.27.5.053010
Received: 6 April 2018; Accepted: 16 August 2018; Published: 14 September 2018
Lens.org Logo
CITATIONS
Cited by 9 scholarly publications and 1 patent.
Advertisement
Advertisement
KEYWORDS
Image filtering

Unmanned aerial vehicles

Electronic filtering

Optical filters

Target detection

Motion models

Optimal filtering

Back to Top