|
1.IntroductionWith the continuous growth of the global population, large-scale public gatherings—such as religious pilgrimages, parades, celebrations, and sporting events—have become more frequent; therefore, crowd management at these gatherings has become an urgent and important problem to be addressed in the field of public safety management. Usually, people in these gatherings move in a limited space, such as urban streets, semienclosed squares, and fully enclosed shopping malls, which are prone to catastrophic accidents, such as crowd congestion and trampling. To effectively supervise and guide crowds at such gatherings for avoiding accidents, video/image crowd monitoring technology has been widely used in video surveillance systems in many public places.1–8 Considering that the most common cause of various accidents is collision among crowds moving in different directions, dividing moving crowds into subgroups based on their movement direction is particularly important; this has been an important research topic in the field of image processing and computer vision.9 In recent years, researchers have proposed a variety of methods for motion segmentation of crowds for different congestion levels in complex scenes;10–12 these can be classified as traditional methods based on single person tracking13,14 and motion modeling methods that model the overall motion characteristics of the crowd.15–26 Among them, the segmentation method based on single person tracking relies on the complete trajectories of all individuals (or feature points and feature areas) in the crowd; therefore, it is only suitable in the case when there is no occlusion or slight occlusion. Once the crowd increases, there is severe occlusion and the acquisition of complete trajectories becomes very difficult resulting in a significant drop in the performance of such methods. Due to the difficulties encountered in tracking individuals (such as occlusion), considering the overall self-organization characteristics of a moving crowd, many researchers have started to focus on the entire crowd instead of individuals in the crowd so that an appropriate motion model can be adopted to model the overall motion of the crowd. At present, studies mainly focus on two kinds of methods for crowd motion modeling: flow-based model and nonflow-based model.9 Because a flow field such as the optical flow field is usually a good estimation of the motion field, a crowd motion segmentation method based on the flow field model was proposed as one of the earliest methods for crowd motion segmentation. It is the basis for a variety of subsequently proposed methods. In a crowd motion segmentation method based on the flow model, apart from the most commonly used optical flow fields,15,17–21 flow fields used for crowd motion modeling also include particle flow field22,23 and streak flow field.24–26 Most existing flow-based models only focus on a specific crowd density. For example, particle flow fields and streak flow fields are, generally, only suitable for a high-density slow-moving crowd. When this kind of model is applied to a low-density crowd, the possibility of oversegmentation is high. To make the motion model based on flow field applicable to crowds with any density, considering that different flow fields are essentially vector fields, some researchers introduced a vector domain segmentation method for specific vector distribution in the flow field model, such as local translation domain segmentation (LTDS) proposed by Wu and Wong20 Although LTDS can be applied for motion segmentation of crowds with different densities, it is prone to undersegmentation between the foreground and background because its objective function considers only the direction differences between vectors, regardless of the magnitude differences. Among the nonflow-based methods, the most representative are dynamic texture-based methods27–30 and tracklet-based methods.31–34 Dynamic texture-based methods first model the moving crowd as a dynamic texture35 with spatiotemporal statistical properties, and then use the matching among model parameters to perform crowd motion segmentation or abnormal behavior detection. As current dynamic texture models are relatively simple (such as linear dynamic system), crowd motion segmentation methods based on dynamic texture are currently only applicable to crowds with relatively simple movement modes and medium/low-densities. Tracklet-based methods first obtain the tracklets of the keypoints in the crowd using a tracker [such as Kanade–Lucas–Tomasi (KLT)], and then applies the similarity measure among the tracklets to complete the crowd motion segmentation. For example, Sharma and Guha34 used the tracklets acquired by KLT to form a complete trajectory of keypoints, and then applied the trajectory clustering approach (TCA) to complete the segmentation of moving crowds. However, this method is not suitable for short-term motion segmentation owing to its requirement for long-term trajectories of the keypoints in the crowd. Zhou et al.31,32 and Fan et al.33 utilized coherent filtering (CF) based on coherent neighbor invariance to perform short-term crowd motion segmentation via the local spatiotemporal relationships and motion correlations among tracklets. However, because coherent neighbor invariance primarily focuses on pairwise motion consistency and ignores the motion difference among all the keypoints in the local region of the center point, undersegmentation occurs easily in crowds with numerous motion patterns. In this study, a TDS method based on improved cosine similarity (ICS) is proposed for crowd motion segmentation. This method adopts the optimized vector domain segmentation model in the flow field and can be applied for the segmentation of crowds with different crowding levels and complex motion modes. The main contributions of this study are as follows:
2.Related WorksTo overcome the problems of traditional methods based on individual tracking, researchers have performed crowd motion segmentation by establishing a series of crowd motion models according to the self-organization characteristics of a moving crowd. Currently, most crowd motion models are based on flow fields. Apart from the most commonly used optical flow field,15,17–21 the flow fields used for modeling moving crowd also include particle flow field22,23 and streak flow field.24–26 2.1.Optical Flow FieldResearch on modelling a moving crowd by optical flow is the most developed. For example, Hu et al.15 used a Gaussian adaptive resonance theory network to extract the prominent flow vectors from a dense optical flow field and then constructed a directed neighborhood graph based on the shortest path search for these flow vectors. Hierarchical agglomerative clustering algorithm was used to segment the directed neighborhood graph to obtain the final segmentation results of a moving crowd. Zhang et al.16 obtained trajectory chains of feature points using an orientation distribution function in a sparse optical flow field; then, they used a spectral clustering method (number of clusters determined by prior knowledge) to complete the classification of trajectories to perform slow-motion segmentation in densely crowded locations. This type of approach usually establishes a crowd motion model directly based on the similarity among optical flow vectors. Because the model is relatively simple, it is generally only applicable to a crowd with good motion consistency. 2.2.Particle Flow FieldParticle flow field is based on the Lagrangian fluid dynamics framework. When used for motion segmentation, the particle trajectory can be estimated by numerical integration through the moving particle grid in the optical flow field. For example, Ali and Shah22 first used the Lagrangian particle dynamic (LPD) algorithm to estimate the particle flow field based on the optical flow field, and then used the finite time Lyapunov exponent to extract the boundary among different flow fields to obtain the segmentation results for a crowd. Because a particle flow field ignores the spatial domain variation and the time delay is obvious, a crowd motion segmentation method based on particle flow field is only suitable for a high-density slow-moving crowd. 2.3.Streak Flow FieldTo obtain a better flow-based model for a moving crowd, Mehran et al.24 introduced streak line in fluid dynamics to estimate the motion field of a crowd, which is called streak flow. Because streak flow can preserve the motion information of the flow for a period, the segmentation performance is better than particle flow when applied to crowds with obvious motion changes. However, the streak flow field is obtained on the basis of the optical flow field. The calculation process of streak flow requires very high accuracy optical flow; however, it is very difficult to meet accuracy requirements for optical flow using conventional methods. 2.4.Vector Domain Segmentation for Flow Field Model ApplicationTo make the flow field model suitable for crowds with different densities and motion modes, some researchers have used the vector domain segmentation model36,37 in the vector field to establish a flow-based model of a moving crowd, such as the LTDS model proposed by Wu and Wong20 Through the normalization of the vector magnitude and the addition of foreground area constraints, LTDS extends the TDS model proposed by Roy et al.36 to the nonunit field; it can be applied to the segmentation of crowds with different densities by localization. However, because the objective function adopted by LTDS only considers the direction difference between vectors, undersegmentation between foreground and background can easily occur. The proposed crowd motion segmentation method is based on a segmentation model in the optical flow field. The ICS measure is used to replace the similarity measure based on normalized vector inner product in the objective function of LTDS. The improved objective function can distinguish between both magnitude and direction of vectors to achieve better segmentation performance. 3.MethodThe overall architecture of the crowd motion segmentation method proposed in this study is shown in Fig. 1. For applying the algorithm to a high-density slow-moving crowd: (1) the optical flow field between two consecutive frames is first obtained by means of an optical flow extraction algorithm (such as the Brox algorithm in Ref. 38); (2) the local ICS is used to obtain the local motion consistency map (LMCM); (3) the initial regions for level set evolution are extracted from the LMCM; (4) through the level set algorithm, the objective function of ICS-based LTDS is used to complete the region evolution in the LMCM, and regions with multiple motion directions are obtained; (5) the final crowd segmentation results are obtained by merging the regions with same direction. In contrast, when the algorithm is used for crowds with medium- or low-densities and complex motion modes, after the initial region extraction step, region evolution is not performed in the LMCM but in the global motion consistency map (GMCM) corresponding to each initial region. The GMCM is obtained by calculating the ICS between the average vector of each initial region and all the vectors. 3.1.TDS and LTDSGiven a unit vector field , the translation domain (TD) is defined as the region in which all vectors have the same direction. The field lines in a TD are a set of parallel lines. Therefore, there must exist a unique normal vector perpendicular to the field lines, denoted as , which can be used as a characteristic parameter to represent the TD. is also known as the dominant translation parameter. Roy et al.36 proved that a vector domain is a TD if and only if there is a unique vector , for , that satisfies the following equation: According to Eq. (1), Roy et al.36 proposed a TDS model for vector domain segmentation. The corresponding objective function of the model is as follows: where is a TD; is an evolution parameter defined to facilitate the use of the active contour algorithm; is used to measure the motion consistency between all vectors in and the TD determined by , which can be called the GMCM corresponding to ; the integral term represents the sum of the squares of the errors in in the GMCM, which reflects the motion consistency of ; the integral term represents the area of ; and is a positive constant.On the basis of Eq. (2), given a deformation velocity , the Gâteaux derivative in the direction of can be given as follows: where is the boundary of , is the integral variable of , and is the unit normal vector of pointing in the direction of the interior of . Then, the minimization of Eq. (2) is performed to use the active contour algorithm to minimize Eq. (3); the corresponding evolution equation of the active contour algorithm is as follows: where is , which is the contour of the TD for evolution, and is its initial value.Meanwhile, Roy et al.36 also proposed an estimation method for the dominant translation parameter , which takes the first half of the right side of Eq. (2) as the objective function and obtains the optimal estimate of by minimizing the objective function, as follows: As is a real symmetric matrix, the optimal estimation of can be determined as the eigenvector corresponding to the smallest eigenvalue of by the quadratic programming model.Because the TDS model is proposed for the TD in the ideal unit vector field, it cannot be directly applied to the segmentation of an actual motion field. To make the TDS model applicable to crowd motion segmentation, Wu and Wong20 established a LTDS model by the normalization of the vector, addition of a foreground motion region area constraint, and localization according to Eq. (2). The objective function of LTDS is as follows: where is a local TD; is a motion field and its magnitude is mapped to the range [0, 1]; is a neighborhood centered on and is the area of ; is the dominant translation parameter of and can be estimated according to Eq. (5); and is the mean of the sum of the squared errors inside , reflecting the local motion consistency. Given the size of , we can calculate of each vector in and use them to form an LMCM, which can better describe the consistent slow motion of high-density crowds than GMCM; is a constant; is proportional to , and is used to measure the extent to which belongs to the foreground, so that can be considered as the foreground area constraint.Similar to the evolution equation [Eq. (4)] of TDS, and considering that is an even function, which makes it impossible to distinguish two completely opposite vectors, the evolution equation of the active contour algorithm corresponding to the LTDS model is as follows: where is the average vector of the region .3.2.ICSIt is well known that the cosine similarity (CS) measure uses the cosine of the angle between two vectors in the vector space to measure the difference between these two vectors: where is the angle between vectors and . The closer the cosine of the angle is to 1, which indicates that the angle will be close to 0, the more similar the two vectors will be. However, as the CS normalizes the magnitudes of the two vectors during calculation, it is impossible to measure the difference between the magnitudes of the two vectors.To measure both magnitude and direction difference between the two vectors using CS, we first add a scale factor that can measure the difference between the magnitudes of the two vectors in the original CS, as shown in Eq. (8); this is defined as follows: Considering that the range of is [0, 1] and that of CS is [], to unify the ranges of both values, by linear mapping, the CS is adjusted to Based on Eqs. (9) and (10), the ICS is given as follows:The basic ICS defined in Eq. (11) cannot be directly used for TD segmentation because there is an imbalance when evaluating the magnitude and direction difference. For example, if the similarity is given as 0.5, when (i.e., there is almost no difference in magnitude between the two vectors) and can be obtained from Eqs. (10) and (11), then ; however, when (i.e., there is almost no difference in direction between the two vectors), , then . This shows that the ICS between the two vectors with small difference in direction and half the difference in magnitude is the same as the ICS between the two vectors that are perpendicular to each other and have the same magnitude, as shown in Fig. 2(a). Considering that there are two vectors perpendicular to a given vector in the same plane, . Fig. 2Imbalance of vectors with the same ICS defined in Eq. (11). (a) ICS between the two vectors with no difference in direction but half the difference in magnitude [i.e., ] is the same as the ICS between the two vectors perpendicular to each other with same magnitude [i.e., and ]. (b) An optical flow field containing the optical flow vectors having the relationship described in panel (a) (the vector in red is the given reference vector ). ![]() For the actual application of this study, , , and should not be the same. Although the magnitude of is quite different from that of , they can be considered to have good similarity because they have the same directional angle (i.e., they have exactly the same direction of motion). In contrast, both and are perpendicular to , resulting in a large difference in their direction of motion. In this case, even if they have the same magnitude, the similarity between them should be very low. Otherwise, two vectors with vertical or even opposite directions may be segmented into the same region, i.e., all the vectors in Fig. 2(b) will be segmented into the same region. That is, when segmenting vectors, the importance of the similarity between the directional angles is significantly higher than the similarity between the magnitudes; for the case shown in Fig. 2, the condition should be satisfied. To enhance the proportion of directional angle similarity in the basic ICS defined in Eq. (11), we add an exponential adjustment factor to that measures the directional angle difference in ICS; then ICS is redefined as follows: where . The larger is, the greater will be the proportion of the directional angle similarity in ICS. Meanwhile, if the minimum similarity is given, i.e., , the larger is, the smaller will be the variation range (defined as ) of the angle difference , as shown in Fig. 3.Fig. 3Relationship between and . When the minimum similarity is given, i.e., , the larger is, the smaller will be . ![]() The value of can be determined according to the distribution of the vectors in the optical flow field, especially the distribution of the vector direction by using the maximum allowable at minimum similarity . For example, the minimum similarity is given as 0.5 and is given as at this similarity [as shown in Fig. 4(a)]; when , takes the minimum value of 0.5 and the corresponding direction angle has the maximum value of , so the following equation holds as follows: By solving Eq. (13), we can obtain . Once is determined, for a given minimum similarity , is limited to irrespective of the change in , as shown in the shaded area in Fig. 4(b). In other words, when , in the limit case, the ICS with and is the same as the ICS with and , i.e., [as shown in Fig. 4(a)]; thus, this avoids extreme imbalance between the magnitude difference and the direction difference shown in Fig. 2. In this manner, when using the ICS to perform the segmentation of a TD, if the minimum similarity is given, the maximum directional angle difference among all vectors in the TD is restricted to , and the normalized maximum magnitude difference is limited to [0.5,1].3.3.ICS-Based TDS and ICS-Based LTDSIt can be seen from Eqs. (2) and (6) that in the objective function of LTDS is similar to the term in the objective function of TDS; they essentially use the CS to measure whether the vector to be segmented is perpendicular to the dominant translation parameter of . In the actual application process, when is constantly evolving, the degree of its unit TD approximation becomes worse; the difference between the magnitudes of the vectors may also increase. In this case, if the magnitude difference is not equalized due to improper selection of , undersegmentation may occur, i.e., regions with vectors having small direction difference but large magnitude difference may be segmented into one region, as shown in Fig. 5. In an actual motion field, the difference between magnitude is often an important basis for distinguishing the foreground from the background; however, the error constraints in the objective function of both TDS and LTDS do not consider the magnitude difference between vectors, which causes the segmentation results to include more background in the motion regions. Fig. 5Undersegmentation caused by the magnitude difference between foreground and background. In the left, the vectors of foreground (gray shaded region) and background (green shaded region) have no difference in terms of direction, but they have obvious difference in magnitude. When the initial region of the active contour algorithm is inside (as indicated by the red dashed line on the left), if the magnitude difference is not equalized due to improper selection of , the evolution result (the area within the red border on the right) will contain both the foreground and the background, i.e., . ![]() Because ICS can measure the difference between both direction and magnitude of vectors, it is expected to improve the segmentation performance by improving the objective function of LTDS and TDS. The key is to use ICS to redefine the error constraint term in the objective function of LTDS and TDS. Referring to Eq. (2), the objective function of the TDS redefined by ICS (ICS-TDS) is as follows: where is the same vector as the field line of the TD . Similar to , can also be used as the dominant translation parameter to represent . However, unlike , because the magnitude of is involved in the minimum and maximum calculation, its optimal estimation cannot be obtained by the quadratic programming algorithm. However, we can use the average vector of all vectors in to approximate according to the definition of , i.e., Although does not obtain its optimal estimation by Eq. (15), its acquisition process is greatly simplified compared to ; its time complexity is significantly reduced from to .After defining the objective function of ICS-TDS given by Eq. (14), referring to the localization method in Eq. (6), we can obtain the objective function of ICS-based LTDS (ICS-LTDS) by redefining with ICS: where is the mean of the sum of the square of ICS-based errors in the neighborhood , which can be used to form the ICS-based LMCM; the definitions of the other parameters are the same as those in Eq. (6). It should be noted that the purpose of localization is to avoid the global parameter or from affecting the error estimation in the local range such that the estimation of in should not be the average vector of defined by Eq. (15), but the average vector of as follows:Similar to TDS and LTDS, the minimization of the objective functions in Eqs. (14) and (16) can also be performed by the active contour algorithm. The evolution equation of ICS-TDS can be obtained by rewriting Eq. (4) as follows: As is not an even function, opposite direction vectors can be well distinguished. Therefore, the evolution equation of the active contour algorithm corresponding to ICS-LTDS can be rewritten by referring to Eq. (7): 3.4.Acquisition of Initial RegionsAs we know, for the active contour algorithm, the selection of the initial regions has a significant impact on the final segmentation results. The segmentation results obtained by erroneous initial regions may be intertwined with the correct segmentation results; therefore, they cannot be eliminated. To obtain suitable initial regions of evolution, Wu and Wong20 first selected the minimum point in the LMCM and then calculated the mean of and the mean optical flow magnitude in the neighborhood of . Only the neighborhood of a point with and higher than given thresholds, which indicates that the motion consistency in its neighborhood is good enough and the possibility that its neighborhood is the foreground is sufficiently high, can be considered as the initial region. However, when this initial region is used for medium- or low-density crowds, if the optical flow estimation error leads to some larger magnitudes of optical flow in the background, some erroneous initial regions may be selected in the background, as shown in Fig. 6(b). In Fig. 6(b), the initial regions in the red circle are all erroneous initial regions. Fig. 6Example of the acquisition of initial regions. (a) The minimum (red points) extraction results in LMCM; (b) The initial regions extracted by and , where the initial regions within the red circle are the erroneous initial regions in the background; and (c) The initial regions extracted after the introduction of , where all the erroneous initial regions in (b) are removed. ![]() Although the magnitudes of optical flow in these erroneous regions are relatively large, the gray level changes in these background regions are actually small and can be easily estimated by the frame difference method. Therefore, in addition to and , we introduce the regional mean based on the frame difference for initial region determination. Only when , , and are all higher than given thresholds, which are, respectively, called local error threshold , local motion magnitude threshold , and local frame difference magnitude threshold , the neighborhood of the minimum point can be confirmed as the initial region. The initial regions extracted after the introduction of can be seen in Fig. 6(c). By comparing Figs. 6(b) and 6(c), it can be seen that most of the erroneous initial regions in the background are removed. 3.5.Algorithm Flow of the Proposed MethodAs mentioned above, ICS-based LMCM is more suitable for segmenting high-density crowds with slow consistent motion because it is based on the local ICS error constraints that can well reflect the motion consistency between local regions. In contrast, ICS-based GMCM is based on the ICS error constraints between all vectors in the motion field and a given TD; therefore, it is more suitable for segmenting motion regions that are consistent with the given TD. If the initial regions contain all possible motion modes, the GMCMs corresponding to all the initial regions can be used to segment the moving crowd with medium- or low-density and complex motion modes. In summary, the detailed flow of the proposed crowd motion segmentation algorithm based on ICS-LTDS/ICS-TDS is shown in Algorithm 1. Algorithm 1Proposed algorithm.
4.Experiments and DiscussionIn this study, two series of experiments were conducted to verify the performance of Algorithm 1. In the first series of experiments, the proposed ICS-LTDS algorithm was applied for the segmentation of high-density slow-moving crowds. In the second series of experiments, the proposed ICS-TDS algorithm was used to segment medium- or low-density crowds with complex motion modes. Most crowd video/image sequences used in the experiments were taken from two public datasets, i.e., UCF dataset22 and UCSD dataset,28 and a small part from YouTube ( https://www.youtube.com/). In addition, methods based on flow field models, specifically, the LTDS algorithm,20 TDS algorithm,36 LPD algorithm,22 streak flow-based algorithm,24 and the prior knowledge-based trajectory tracking algorithm,16 and recently proposed typical nonflow-based methods, specifically, the CF algorithm32 and TCA,34 were applied to the same dataset for performance comparison. During the experiments, all algorithms were run on the MATLAB 8.6 platform in the Windows 10 Pro environment. To evaluate the segmentation performance of the algorithm, the ground truth was set manually for some representative images. Meanwhile, on the basis of the ground truth, the numerical segmentation accuracy (SA) corresponding to different directions was given by means of the Jaccard similarity coefficient39 among sets. The numerical SA is calculated as follows: where represents the index of the regions corresponding to different motion directions, is the segmentation result corresponding to , and is the ground truth region corresponding to .4.1.Segmentation of High-Density Slow-Moving CrowdsAlgorithm 1 is applied for segmentation of high-density slow-moving crowds, as described in Sec. 3.5, in the LMCM by using the evolution equation [Eq. (19)] based on the objective function of ICS-LTDS (). In all experiments, if the size of the entire image is mapped to , the size of the neighborhood is . The initial region of evolution is centered on the minimum of the LMCM and the size is . The thresholds and are set to 0.06 and 2, respectively. When the threshold is used for different image sequences, the set values are between 1.3 and 3.4. The window size for calculating , , and is . The constant in is set to 0.1. The constant is given by referring to LTDS:20 where is the angle difference between the two regions to be segmented, which can be given first by , and adjusted according to the actual situation; and are the average values of and in the initial region. For the settings of LTDS, parameters that are similar to those of ICS-LTDS, such as the size of , , , , and , are all set to the same values as in ICS-LTDS for performance comparison. The representative segmentation results of the flow field model-based methods (i.e., ICS-LTDS, LTDS, LPD, streak flow, and prior knowledge-based trajectory tracking) for four image sequences of high-density crowds with slow motion are shown in Fig. 7. The SA calculated according to Eq. (20) is shown in Table 1.Fig. 7Image sequences of high-density slow-moving crowds used for testing: (a) Mecca, (b) pilgrims, (c) marathon, and (d) kabba. Representative segmentation results (top to bottom): Ground truth images, segmentation results of the proposed ICS-LTDS, segmentation results of LTDS, segmentation results of LPD, segmentation results of streak flow, and segmentation results of prior knowledge-based trajectory tracking. ![]() Table 1Segmentation accuracies of the proposed ICS-LTDS, LTDS, LPD, streak flow, and prior knowledge-based trajectory tracking for the scenes in Fig. 7.
Note: Bold values indicate the maximum of the average accuracy of the different methods for the same test sequence. To compare the segmentation results of the proposed ICS-LTDS algorithm with those of the typical nonflow-based methods in scenarios with high-density slow-moving crowds, we selected pilgrims and marathon as test sequences, and applied the proposed ICS-LTDS, TCA,34 and the CF-based algorithm32 to them. Representative segmentation results obtained from them are shown in Fig. 8 and the segmentation accuracies calculated according to Eq. (20) are shown in Table 2. For TCA, the following parameter settings were used: , (10 for pilgrims), , , , , , and ; for the CF algorithm, , , the upper bound of was taken as 1, and the threshold coefficient was 0.5. Fig. 8Image sequences of high-density slow-moving crowds used for testing (top to bottom): pilgrims and marathon. Representative segmentation results: (a) ground truth images; (b) segmentation results of the proposed ICS-LTDS; (c) segmentation results of TCA; and (d) segmentation results of CF algorithm. ![]() Table 2Segmentation accuracies of the proposed ICS-LTDS, TCA, and CF algorithm for the scenes in Fig. 8.
Note: Bold values indicate the maximum of the average accuracy of the different methods for the same test sequence. Similar to crowd flow in high-density slow-moving crowds, the proposed ICS-LTDS algorithm can also be applied to slow-moving traffic flow or mixed flow. We selected two image sequences for algorithm evaluation in this noncrowd scenario. In addition, flow-based ICS-LTDS, LTDS, and LPD, and nonflow-based TCA were selected for the evaluation and comparison in the same noncrowd scenario. Representative segmentation results for two image sequences of noncrowd scenes are shown in Fig. 9. The segmentation accuracies calculated according to Eq. (20) are shown in Table 3. Fig. 9Image sequences of noncrowd scenes used for testing (top to bottom): roundabout and highway traffic. Representative segmentation results: (a) ground truth images; (b) segmentation results of the proposed ICS-LTDS; (c) segmentation results of LTDS; (d) segmentation results of LPD; and (e) segmentation results of TCA. ![]() Table 3SA of the proposed ICS-LTDS, LTDS, LPD, and TCA for the scenes in Fig. 8.
Note: Bold values indicate the maximum of the average accuracy of the different methods for the same test sequence. Further, we obtained the ground truth of 100 consecutive frames from the marathon sequence manually to perform evaluation on a sequence containing a slow-moving crowd. The segmentation results of the representative five consecutive frames of the three best-performing flow-based algorithms (i.e., the proposed ICS-LTDS, LTDS,20 and LPD22) and the nonflow-based CF algorithm32 are shown in Fig. 10. The highest, lowest, and average segmentation accuracies of all 100 frames in the entire sequence are shown in Table 4. Fig. 10Segmentation results of the representative five consecutive frames (top to bottom) in the marathon sequence: (a) ground truth; (b) segmentation results of the proposed ICS-LTDS; (c) segmentation results of LTDS; (d) segmentation results of LPD; and (e) segmentation results of CF algorithm. ![]() Table 4Highest, lowest, and average segmentation accuracies of all 100 frames in the Marathon sequence.
Note: Bold values indicate the maximum of the average accuracy of the different methods for the same test sequence. From the results in Figs. 7–10 and Tables 1–4, the following are clear: (1) As the principles of particle flow and streak flow are similar, both process motion information recorded for a period; therefore, the segmentation results of LPD and streak flow are similar. The proposed ICS-LTDS is an improvement of LTDS and so the segmentation results of these two methods are similar. (2) Although particle flow and streak flow are highly suitable for segmenting slow-moving targets, when the speed of the target has a downward trend and tends to stop, both methods will assume that the target has not moved for a period and thus abandon the extraction of the target. For example, in the highway traffic sequence (last row in Fig. 9), the traffic that decelerates and gradually stops because of congestion is not segmented by LPD. In contrast, as ICS-LTDS and LTDS are based on real-time optical flow, both segment these targets. (3) As the prior knowledge-based trajectory tracking method is used in the sparse optical flow field formed by feature points, it is only suitable for the motion segmentation of people in extremely crowded places. For a scene where the feature points are not easily extracted or the crowd has looser distribution, missed segmentation is highly possible, as shown in the last row of Figs. 7(a)–7(d). (4) From the discussion in Sec. 3.3, as the objective function of LTDS only distinguishes the direction differences between vectors regardless of the magnitude differences, it is prone to produce undersegmentation results between foreground and background. This phenomenon is particularly evident in the segmentation results for the roundabout and highway traffic sequences in Fig. 9(c), and the marathon sequence in Fig. 10(c), which results in a lower SA than ICS-LTDS with improved objective function. (5) As TCA needs to record the complete trajectories of the keypoints, it cannot cope with short-term changes of the moving regions. This makes it easy to produce undersegmentation between the foreground and background, as shown in Figs. 8(c) and 9(e). (6) Except for the pilgrims and kabba sequence, the SA of ICS-LTDS is higher than other methods, indicating that the ICS-LTDS algorithm is more suitable for segmentation of high-density slow-moving crowds than existing flow-based and nonflow-based crowd segmentation algorithms. 4.2.Segmentation of Medium- or Low-Density Crowds with Complex Motion ModesAlgorithm 1 is applied to medium- or low-density crowds with complex motion modes, as described in Sec. 3.5, in the GMCM corresponding to each initial region by using the evolution equation [Eq. (18)] based on the objective function of ICS-TDS (). In all experiments, the average vector of the initial region is taken as to calculate the GMCM. Other parameters, such as , , , and , are all consistent with ICS-LTDS. In LTDS, the evolution of the objective function is performed in LMCM, which is quite different from ICS-TDS. Therefore, in addition to LTDS, segmentation results of TDS, which also performs the evolution in the GMCM, are included for comparison. However, as the basic TDS proposed in Ref. 36 is only applicable to the unit vector field, it cannot be directly used in the actual scenes, as shown in Fig. 10. To make TDS available for crowd segmentation in the actual motion field, by referring to LTDS, we improve TDS by normalizing the motion vector and adding the foreground motion area constraint in the objective function of TDS: The TDS defined by Eq. (22) can be called the TDS with , abbreviated as TDS-. According to Eqs. (4) and (7), the evolution equation of TDS- can be defined as follows:In addition, among other methods based on flow field modeling, the particle flow-based LPD is selected for algorithm comparison. To facilitate the comparison among algorithms, the common parameters of ICS-TDS, TDS-, and LTDS are given the same values. The representative segmentation results of ICS-TDS, TDS-, LTDS, and LPD applied to four image sequences are shown in Fig. 11, and the segmentation accuracies are shown in Table 5. Fig. 11Representative segmentation result. Image sequences used for testing (top to bottom): crosswalk, UCSD pedestrians, Fudan pedestrians, and multidirectional pedestrians (MD pedestrians). (a) Ground truth images; (b) segmentation results of proposed ICS-TDS; (c) segmentation results of TDS-; (d) segmentation results of LTDS; and (e) segmentation results of LPD. ![]() Table 5SA of the proposed ICS-TDS, TDS-G, LTDS, and LPD for the scenes in Fig. 11.
Note: Bold values indicate the maximum of the average accuracy of the different methods for the same test sequence. It should be noted that in a close-up scene containing multidirectional pedestrians as shown in the fourth row of Fig. 10, the term in Eq. (20) is difficult to estimate due to the numerous moving directions of the pedestrians, mutual occlusion, and even differences in moving directions of the different parts of the same body, which makes it impossible to calculate the SA separately for different directions according to Eq. (20). Considering that the SA should simultaneously measure the similarity between both area and direction of the segment and the ground truths, on the basis of Eq. (20), we add the direction consistency measure between and , so that the mean of the SA for different directions can be directly calculated without segmenting the ground truth . The SA after adding the direction consistency measure is as follows: where is the direction consistency measure between and , and its value should be mapped to [0,1] so that it can be calculated by means of CS as follows: where is the average vector of the segmented region and is the vector of the ground truth at location . Substitute Eq. (25) into Eq. (24) to get the final form of , which is used to calculate the SA of the results in the fourth row in Fig. 10:To compare the segmentation results of the proposed ICS-LTDS algorithm with those of the typical nonflow-based methods in crowds with medium- or low-density and complex motion modes, we selected the crosswalk and MD pedestrians as test sequences, and applied the proposed ICS-LTDS, TCA,34 and the CF-based algorithm32 to them. Representative segmentation results are shown in Fig. 12 and the segmentation accuracies calculated according to Eq. (20) [Eq. (26) for MD pedestrians] are shown in Table 6. For TCA, the parameter settings were as follows: , , , , , , and ; for the CF algorithm, , , the upper bound of was taken as 1, and the threshold coefficient as 0.5. Fig. 12Representative segmentation result. Image sequences used for testing (top to bottom): crosswalk and MD pedestrians. (a) Ground truth images; (b) segmentation results of proposed ICS-TDS; (c) segmentation results of TCA; and (d) segmentation results of CF algorithm. ![]() Table 6Segmentation accuracies of the proposed ICS-TDS, TCA, and CF algorithm for the scenes in Fig. 12.
Note: Bold values indicate the maximum of the average accuracy of the different methods for the same test sequence. Similar to Sec. 4.1, we obtained the ground truth of the 100 consecutive frames in the crosswalk sequence manually to evaluate sequences containing fast-moving crowds with complex motion modes. The segmentation results of the representative five consecutive frames of the three best-performing flow-based algorithms (i.e., the proposed ICS-TDS, TDS-,36 and LPD22) and the nonflow-based CF algorithm32 are shown in Fig. 13. The highest, lowest, and average segmentation accuracies of all 100 frames in the entire sequence are shown in Table 7. Fig. 13Segmentation results of the representative five consecutive frames (top to bottom) in the crosswalk sequence: (a) ground truth; (b) segmentation results of proposed ICS-TDS; (c) segmentation results of TDS-; (d) segmentation results of LPD; and (e) segmentation results of CF algorithm. ![]() Table 7Highest, lowest, and average segmentation accuracies of all 100 frames in the crosswalk sequence.
Note: Bold values indicate the maximum of the average accuracy of the different methods for the same test sequence. From the results shown in Figs. 11–13 and Tables 5–7, the following is clear: (1) The particle flow-based LPD algorithm is not suitable for segmenting crowds with loose distribution, especially for segmenting crowds with complex motion modes. For example, in the segmentation results of crosswalk, as shown in the first row in Fig. 11(e) and the last row in Fig. 13(d), there are obvious missing or incorrect segmentations. Meanwhile, in the segmentation results of multidirectional pedestrians, several pedestrians with distinctly different motion directions are segmented in the same region, resulting in severe undersegmentation [as shown in Fig. 14, which is the zoomed-in version of the last row of Fig. 11(e)]. (2) As LMCM is based on the motion consistency between local regions, when pedestrians with little difference in motion direction occlude each other, their motion boundaries are not clear enough in LMCM, which makes LTDS prone to produce undersegmentation results for pedestrians that occlude each other, as shown in Fig. 15 [the zoomed-in version of the last row of Fig. 11(d)]. (3) Although TDS- and LTDS have almost the same objective function, TDS- evolves in the GMCM while LTDS evolves in the LMCM. Except for UCSD pedestrians, the segmentation results of TDS- are better than those of LTDS. It is indicated that the method based on the evolution in the GMCM is more suitable for segmenting medium- or low-density crowds with complex motion modes. As TDS- has the same objective function as LTDS, which only distinguishes the vector direction but not the vector magnitude, the undersegmentation results between foreground and background caused by the objective function are still serious in the scene shown in Figs. 11 and 13; therefore, their SA is significantly lower than that of ICS-TDS. (4) As TCA cannot cope with short-term changes in the moving regions, when it is applied to crowd scenes containing complex motion patterns, the segmentation results still contain severe undersegmentation, as shown in Fig. 16(b) [zoomed-in version of the last row of Fig. 12(c)]. Meanwhile, as the coherent neighbor invariance mainly focuses on the pairwise motion consistency and ignores the motion difference among all the keypoints in the local region of the center point, when the CF algorithm is applied to crowd scenes containing complex motion patterns, undersegmentation results are easily produced, as shown in Fig. 16(c) [zoomed-in version of the last row of Fig. 12(d)]. (5) For the four scenes shown in Fig. 10, ICS-TDS is superior to other methods in terms of both segmentation effect and SA, indicating that the ICS-TDS algorithm is more suitable for segmenting medium- or low-density crowds with complex motion modes than existing flow-based and nonflow-based crowd motion segmentation algorithms. Fig. 14Undersegmentation results produced by LPD: (a) ground truth and (b) segmentation results of LPD. In panels (a) and (b), the same ellipse corresponds to the same pedestrians. In the ground truth, the pedestrians in blue and yellow ellipses have at least two motion directions, but in the segmentation results of LPD, the pedestrians in each ellipse have the same motion direction. ![]() Fig. 15Undersegmentation results produced by LTDS. (a) Ground truth and (b) segmentation results of LTDS. In panels (a) and (b), the same ellipse corresponds to the same pedestrian(s). In the ground truth, the pedestrians in blue, yellow, and red ellipses have two motion directions, but in the segmentation results of LTDS, the pedestrians in each ellipse have the same motion direction. ![]() Fig. 16Undersegmentation results produced by TCA and CF algorithm. (a) Ground truth; (b) segmentation results of TCA; and (c) segmentation results of CF algorithm. In panels (a), (b), and (c), the same ellipse corresponds to the same pedestrian(s). In the ground truth, the pedestrians in blue, yellow, and purple ellipses have at least two motion directions, but in the segmentation results of TCA and CF algorithm, the pedestrians in each ellipse have the same motion direction. ![]() 5.ConclusionIn this study, a TDS model based on ICS was proposed; this model can be used to segment moving crowds with different crowding levels and complex motion modes. The method reconstructs the objective function of the TDS model by ICS so that the objective function can simultaneously measure the difference between both magnitude and direction of two vectors; thus, it can effectively avoid undersegmentation between the foreground and background due to the difference in magnitude. By switching between the “localization” and “globalization” modes of the objective function, i.e., performing the region evolution in LMCM or GMCM, the algorithm can be applied to segment crowds with different densities and motion states. In addition, by introducing the local motion magnitude threshold and local frame difference magnitude threshold simultaneously, the nonforeground regions can be excluded from the initial regions of evolution; thus, the undersegmentation and mis-segmentation results caused by erroneous initial regions are greatly reduced. The experimental results showed that for a variety of complex scenes containing moving crowds, the segmentation performance and accuracy of the proposed ICS-LTDS/TDS-based crowd motion segmentation method are superior to existing flow field-based motion segmentation methods. AcknowledgmentsThis work was supported by the National Natural Science Foundation of China (Grant No. 61601156, 61701146, 61871164), and the Zhejiang Provincial Natural Science Foundation of China (Grant No. Q16F010019). The authors declare that they have no competing interests. ReferencesH. Chaudhry et al.,
“Crowd region detection in outdoor scenes using color spaces,”
Int. J. Model. Simul. Sci. Comput., 9
(2), 1850012
(2018). https://doi.org/10.1142/S1793962318500125 Google Scholar
D. Peleshko et al.,
“Design and implementation of visitors queue density analysis and registration method for retail video surveillance purposes,”
in Proc. IEEE First Int. Conf. Data Stream Min. Process. (DSMP),
159
–162
(2016). https://doi.org/10.1109/DSMP.2016.7583531 Google Scholar
J. Luo et al.,
“Real-time people counting for indoor scenes,”
Signal Process., 124 27
–35
(2016). https://doi.org/10.1016/j.sigpro.2015.10.036 Google Scholar
M. Sajid, A. Hassan and S. A. Khan,
“Crowd counting using adaptive segmentation in a congregation,”
in Proc. IEEE Int. Conf. Signal and Image Process. (ICSIP),
745
–749
(2016). https://doi.org/10.1109/SIPROCESS.2016.7888363 Google Scholar
Z. Zhang, M. Wang and X. Geng,
“Crowd counting in public video surveillance by label distribution learning,”
Neurocomputing, 166 151
–163
(2015). https://doi.org/10.1016/j.neucom.2015.03.083 NRCGEO 0925-2312 Google Scholar
X. Hu et al.,
“A novel approach for crowd video monitoring of subway platforms,”
Optik, 124
(22), 5301
–5306
(2013). https://doi.org/10.1016/j.ijleo.2013.03.057 OTIKAJ 0030-4026 Google Scholar
B. Sirmacek and P. Reinartz,
“Automatic crowd density and motion analysis in airborne image sequences based on a probabilistic framework,”
in Proc. IEEE Int. Conf. Comput. Vision Workshops (ICCV Workshops),
898
–905
(2011). https://doi.org/10.1109/ICCVW.2011.6130347 Google Scholar
A. G. Abuarafah, M. O. Khozium and E. AbdRabou,
“Real-time crowd monitoring using infrared thermal video sequences,”
J. Am. Sci., 8
(3), 133
–140
(2012). Google Scholar
T. Li et al.,
“Crowded scene analysis: a survey,”
IEEE Trans. Circuits Syst. Video Technol., 25
(3), 367
–386
(2015). https://doi.org/10.1109/TCSVT.2014.2358029 Google Scholar
A. Dehghan et al.,
“Automatic detection and tracking of pedestrians in videos with various crowd densities,”
Pedestrian and Evacuation Dynamics, 3
–19 Springer, Zurich, Switzerland
(2012). Google Scholar
H. Ullah et al.,
“Density independent hydrodynamics model for crowd coherency detection,”
Neurocomputing, 242 28
–39
(2017). https://doi.org/10.1016/j.neucom.2017.02.023 NRCGEO 0925-2312 Google Scholar
D. Zhang et al.,
“High-density crowd behaviors segmentation based on dynamical systems,”
Multimedia Syst., 23
(5), 599
–606
(2017). https://doi.org/10.1007/s00530-016-0520-y MUSYEW 1432-1882 Google Scholar
W. Ge, R. T. Collins and R. B. Ruback,
“Vision-based analysis of small groups in pedestrian crowds,”
IEEE Trans. Pattern Anal. Mach. Intell., 34
(5), 1003
–1016
(2012). https://doi.org/10.1109/TPAMI.2011.176 ITPIDJ 0162-8828 Google Scholar
L. Dong et al.,
“Fast crowd segmentation using shape indexing,”
in Proc. 11th IEEE Int. Conf. Comput. Vision,
1
–8
(2007). https://doi.org/10.1109/ICCV.2007.4409075 Google Scholar
M. Hu, S. Ali and M. Shah,
“Learning motion patterns in crowded scenes using motion flow field,”
in Proc. Int. Conf. Pattern Recognit.,
1
–5
(2008). https://doi.org/10.1109/ICPR.2008.4761183 Google Scholar
L. Zhang et al.,
“Crowd segmentation method based on trajectory tracking and prior knowledge learning,”
Arabian J. Sci. Eng., 43 7143
–7152
(2018). https://doi.org/10.1007/s13369-017-2995-z Google Scholar
D. Cremers and S. Soatto,
“Motion competition: a variational approach to piecewise parametric motion segmentation,”
Int. J. Comput. Vision, 62
(3), 249
–265
(2005). https://doi.org/10.1007/s11263-005-4882-4 IJCVEQ 0920-5691 Google Scholar
T. Brox et al.,
“Colour, texture, and motion in level set based segmentation and tracking,”
Image Vision Comput., 28
(3), 376
–390
(2010). https://doi.org/10.1016/j.imavis.2009.06.009 IVCODK 0262-8856 Google Scholar
A. S. Rao et al.,
“Crowd event detection on optical flow manifolds,”
IEEE Trans. Cybern., 46
(7), 1524
–1537
(2016). https://doi.org/10.1109/TCYB.2015.2451136 Google Scholar
S. Wu and H. S. Wong,
“Crowd motion partitioning in a scattered motion field,”
IEEE Trans. Syst. Man Cybern. Part B, 42
(5), 1443
–1454
(2012). https://doi.org/10.1109/TSMCB.2012.2192267 Google Scholar
Y. Yuan, J. Fang and Q. Wang,
“Online anomaly detection in crowd scenes via structure analysis,”
IEEE Trans. Cybern., 45
(3), 548
–561
(2015). https://doi.org/10.1109/TCYB.2014.2330853 Google Scholar
S. Ali and M. Shah,
“A Lagrangian particle dynamics approach for crowd flow segmentation and stability analysis,”
in Proc. IEEE Conf. Comput. Vision and Pattern Recognit. (CVPR),
1
–6
(2007). https://doi.org/10.1109/CVPR.2007.382977 Google Scholar
S. Mukherjee, D. Goswami and S. Chatterjee,
“A Lagrangian approach to modeling and analysis of a crowd dynamics,”
IEEE Trans. Syst. Man Cybern., 45
(6), 865
–876
(2015). https://doi.org/10.1109/TSMC.2015.2389763 Google Scholar
R. Mehran, B. E. Moore and M. Shah,
“A streakline representation of flow in crowded scenes,”
Lect. Notes Comput. Sci., 6313 439
–452
(2010). https://doi.org/10.1007/978-3-642-15558-1 LNCSD9 0302-9743 Google Scholar
X. Wang et al.,
“A high accuracy flow segmentation method in crowded scenes based on streakline,”
Optik, 125
(3), 924
–929
(2014). https://doi.org/10.1016/j.ijleo.2013.07.166 OTIKAJ 0030-4026 Google Scholar
M. Gao et al.,
“Crowd motion segmentation and behavior recognition fusing streak flow and collectiveness,”
Opt. Eng., 57
(4), 043109
(2018). https://doi.org/10.1117/1.OE.57.4.043109 Google Scholar
A. B. Chan and N. Vasconcelos,
“Modeling, clustering, and segmenting video with mixtures of dynamic textures,”
IEEE Trans. Pattern Anal. Mach. Intell., 30
(5), 909
–926
(2008). https://doi.org/10.1109/TPAMI.2007.70738 ITPIDJ 0162-8828 Google Scholar
V. Mahadevan et al.,
“Anomaly detection in crowded scenes,”
in Proc. IEEE Conf. Comput. Vision and Pattern Recognit. (CVPR),
1975
–1981
(2010). https://doi.org/10.1109/CVPR.2010.5539872 Google Scholar
S. Duan, X. Wang and X. Yu,
“Crowded abnormal detection based on mixture of kernel dynamic texture,”
in Proc. Int. Conf. Audio, Language and Image Process.,
931
–936
(2014). https://doi.org/10.1109/ICALIP.2014.7009931 Google Scholar
Y. Ma, P. Cisar and A. Kembhavi,
“Motion segmentation and activity representation in crowds,”
Int. J. Imaging Syst. Technol., 19
(2), 80
–90
(2009). https://doi.org/10.1002/ima.v19:2 IJITEG 0899-9457 Google Scholar
B. Zhou, X. Tang and X. Wang,
“Coherent filtering: detecting coherent motions from crowd clutters,”
Lect. Notes Comput. Sci., 7573 857
–871
(2012). https://doi.org/10.1007/978-3-642-33709-3 LNCSD9 0302-9743 Google Scholar
B. Zhou et al.,
“Measuring crowd collectiveness,”
IEEE Trans. Pattern Anal. Mach. Intell., 36
(8), 1586
–1599
(2014). https://doi.org/10.1109/TPAMI.2014.2300484 ITPIDJ 0162-8828 Google Scholar
Z. Fan et al.,
“Adaptive crowd segmentation based on coherent motion detection,”
J. Signal Process. Syst., 90
(12), 1651
–1666
(2018). https://doi.org/10.1007/s11265-017-1309-8 Google Scholar
R. Sharma and T. Guha,
“A trajectory clustering approach to crowd flow segmentation in videos,”
in Proc. IEEE Int. Conf. Image Process. (ICIP),
1200
–1204
(2016). https://doi.org/10.1109/ICIP.2016.7532548 Google Scholar
G. Doretto et al.,
“Dynamic textures,”
Int. J. Comput. Vision, 51
(2), 91
–109
(2003). https://doi.org/10.1023/A:1021669406132 IJCVEQ 0920-5691 Google Scholar
T. Roy et al.,
“Segmentation of a vector field: dominant parameter and shape optimization,”
J. Math. Imaging Vision, 24
(2), 259
–276
(2006). https://doi.org/10.1007/s10851-005-3627-x JIMVEC 0924-9907 Google Scholar
H. Li, W. Chen and I.-F. Shen,
“Segmentation of discrete vector fields,”
IEEE Trans. Visual Comput. Graphics, 12
(3), 289
–300
(2006). https://doi.org/10.1109/TVCG.2006.54 Google Scholar
T. Brox et al.,
“High accuracy optical flow estimation based on a theory for warping,”
Lect. Notes Comput. Sci., 3024 25
–36
(2004). https://doi.org/10.1007/b97873 LNCSD9 0302-9743 Google Scholar
P. Jaccard,
“The distribution of the flora of the Alpine zone,”
New Phytol., 11
(2), 37
–50
(1912). https://doi.org/10.1111/nph.1912.11.issue-2 NEPHAV 0028-646X Google Scholar
BiographyHaibin Yu received his PhD degree in communication and information system from Zhejiang University in 2007. He is currently an associate professor in the College of Electronics and Information, Hangzhou Dianzi University. He was a visiting scholar at the Department of Neurosurgery, University of Pittsburgh, from November 2016 to November 2017. His research interests include computer vision, image processing, machine learning, and multisensor fusion. Guoxiong Pan received his undergraduate degree in electronics and information engineering from Hangzhou Dianzi University in 2017. He is currently a graduate student in electronic science and technology at Hangzhou Dianzi University. His research interests include computer vision, image processing, and embedded systems. Li Zhang received her PhD degree in communication and information system from Zhejiang University in 2008. She is currently a teacher in the School of Computer Science and Technology, Hangzhou Dianzi University. Her research interests include computer vision, image processing, and deep learning. Zhu Li received his PhD degree in Tokyo University of Agriculture and Technology. He is currently an associate professor in the College of Electronics and Information, Hangzhou Dianzi University. His research interests include computer vision, image processing, and machine learning. Mian Pan received his PhD degree in electronic engineering from Xidian University in 2013. He is currently a lecturer in the College of Electronics and Information, Hangzhou Dianzi University. His main research interests include statistical signal processing, adaptive signal processing, computer vision, image processing, machine learning, and their applications in industry. |