Open Access
13 March 2019 Translation domain segmentation model based on improved cosine similarity for crowd motion segmentation
Haibin Yu, Guoxiong Pan, Li Zhang, Zhu Li, Mian Pan
Author Affiliations +
Abstract
With the continuous growth of the global population, large-scale public gatherings have become more common, and crowd management at these gatherings has become an urgent problem for public safety management. Crowd motion analysis and early warning based on crowd motion segmentation in video surveillance systems has become an important research topic in computer vision. A translation domain segmentation (TDS) model based on improved cosine similarity (ICS) is proposed to segment moving crowds with different crowding levels and complex motion modes. The method reconstructs the objective function of the basic TDS model by ICS to simultaneously measure the difference between both magnitude and direction of two vectors; thus, undersegmentation due to the magnitude difference can be avoided. By switching between “localization” and “globalization” modes of the objective function, the algorithm can be applied to segment crowds with different densities and motion states. Moreover, by simultaneously introducing local motion magnitude and local frame difference magnitude thresholds, nonforeground regions can be excluded from the initial regions during region evolution. Experimental results show that the proposed method achieves superior performance and higher accuracy compared to existing flow field-based methods when applied to complex scenes containing moving crowds.

1.

Introduction

With the continuous growth of the global population, large-scale public gatherings—such as religious pilgrimages, parades, celebrations, and sporting events—have become more frequent; therefore, crowd management at these gatherings has become an urgent and important problem to be addressed in the field of public safety management. Usually, people in these gatherings move in a limited space, such as urban streets, semienclosed squares, and fully enclosed shopping malls, which are prone to catastrophic accidents, such as crowd congestion and trampling. To effectively supervise and guide crowds at such gatherings for avoiding accidents, video/image crowd monitoring technology has been widely used in video surveillance systems in many public places.18 Considering that the most common cause of various accidents is collision among crowds moving in different directions, dividing moving crowds into subgroups based on their movement direction is particularly important; this has been an important research topic in the field of image processing and computer vision.9

In recent years, researchers have proposed a variety of methods for motion segmentation of crowds for different congestion levels in complex scenes;1012 these can be classified as traditional methods based on single person tracking13,14 and motion modeling methods that model the overall motion characteristics of the crowd.1526 Among them, the segmentation method based on single person tracking relies on the complete trajectories of all individuals (or feature points and feature areas) in the crowd; therefore, it is only suitable in the case when there is no occlusion or slight occlusion. Once the crowd increases, there is severe occlusion and the acquisition of complete trajectories becomes very difficult resulting in a significant drop in the performance of such methods.

Due to the difficulties encountered in tracking individuals (such as occlusion), considering the overall self-organization characteristics of a moving crowd, many researchers have started to focus on the entire crowd instead of individuals in the crowd so that an appropriate motion model can be adopted to model the overall motion of the crowd. At present, studies mainly focus on two kinds of methods for crowd motion modeling: flow-based model and nonflow-based model.9 Because a flow field such as the optical flow field is usually a good estimation of the motion field, a crowd motion segmentation method based on the flow field model was proposed as one of the earliest methods for crowd motion segmentation. It is the basis for a variety of subsequently proposed methods. In a crowd motion segmentation method based on the flow model, apart from the most commonly used optical flow fields,15,1721 flow fields used for crowd motion modeling also include particle flow field22,23 and streak flow field.2426 Most existing flow-based models only focus on a specific crowd density. For example, particle flow fields and streak flow fields are, generally, only suitable for a high-density slow-moving crowd. When this kind of model is applied to a low-density crowd, the possibility of oversegmentation is high. To make the motion model based on flow field applicable to crowds with any density, considering that different flow fields are essentially vector fields, some researchers introduced a vector domain segmentation method for specific vector distribution in the flow field model, such as local translation domain segmentation (LTDS) proposed by Wu and Wong20 Although LTDS can be applied for motion segmentation of crowds with different densities, it is prone to undersegmentation between the foreground and background because its objective function considers only the direction differences between vectors, regardless of the magnitude differences.

Among the nonflow-based methods, the most representative are dynamic texture-based methods2730 and tracklet-based methods.3134 Dynamic texture-based methods first model the moving crowd as a dynamic texture35 with spatiotemporal statistical properties, and then use the matching among model parameters to perform crowd motion segmentation or abnormal behavior detection. As current dynamic texture models are relatively simple (such as linear dynamic system), crowd motion segmentation methods based on dynamic texture are currently only applicable to crowds with relatively simple movement modes and medium/low-densities. Tracklet-based methods first obtain the tracklets of the keypoints in the crowd using a tracker [such as Kanade–Lucas–Tomasi (KLT)], and then applies the similarity measure among the tracklets to complete the crowd motion segmentation. For example, Sharma and Guha34 used the tracklets acquired by KLT to form a complete trajectory of keypoints, and then applied the trajectory clustering approach (TCA) to complete the segmentation of moving crowds. However, this method is not suitable for short-term motion segmentation owing to its requirement for long-term trajectories of the keypoints in the crowd. Zhou et al.31,32 and Fan et al.33 utilized coherent filtering (CF) based on coherent neighbor invariance to perform short-term crowd motion segmentation via the local spatiotemporal relationships and motion correlations among tracklets. However, because coherent neighbor invariance primarily focuses on pairwise motion consistency and ignores the motion difference among all the keypoints in the local region of the center point, undersegmentation occurs easily in crowds with numerous motion patterns.

In this study, a TDS method based on improved cosine similarity (ICS) is proposed for crowd motion segmentation. This method adopts the optimized vector domain segmentation model in the flow field and can be applied for the segmentation of crowds with different crowding levels and complex motion modes. The main contributions of this study are as follows:

  • 1. ICS is used to reconstruct the objective function of the TDS model. Compared with the similarity measure used in TDS, which can only measure the direction difference between vectors, ICS can measure the difference between both magnitude and direction of two vectors, which can effectively avoid undersegmentation caused by the magnitude difference; thus, it can greatly improve the adaptability of the TDS model.

  • 2. The proposed method can be applied for motion segmentation of crowds with different densities and motion modes. When used for segmenting a crowd with high density and slow movement, the objective function can be localized so that the segmentation is completed by evaluating the motion consistency among local regions; when used for segmenting a moving crowd with medium- or low-density, the segmentation can be accomplished directly by evaluating the motion consistency between all vectors and each initial region.

  • 3. In the selection of initial regions, both the local motion magnitude threshold and local frame difference magnitude threshold are introduced in all candidate regions with the best local motion consistency to exclude the nonforeground regions from the initial regions as much as possible, which will inevitably lead to incorrect segmentation results.

2.

Related Works

To overcome the problems of traditional methods based on individual tracking, researchers have performed crowd motion segmentation by establishing a series of crowd motion models according to the self-organization characteristics of a moving crowd. Currently, most crowd motion models are based on flow fields. Apart from the most commonly used optical flow field,15,1721 the flow fields used for modeling moving crowd also include particle flow field22,23 and streak flow field.2426

2.1.

Optical Flow Field

Research on modelling a moving crowd by optical flow is the most developed. For example, Hu et al.15 used a Gaussian adaptive resonance theory network to extract the prominent flow vectors from a dense optical flow field and then constructed a directed neighborhood graph based on the shortest path search for these flow vectors. Hierarchical agglomerative clustering algorithm was used to segment the directed neighborhood graph to obtain the final segmentation results of a moving crowd. Zhang et al.16 obtained trajectory chains of feature points using an orientation distribution function in a sparse optical flow field; then, they used a spectral clustering method (number of clusters determined by prior knowledge) to complete the classification of trajectories to perform slow-motion segmentation in densely crowded locations. This type of approach usually establishes a crowd motion model directly based on the similarity among optical flow vectors. Because the model is relatively simple, it is generally only applicable to a crowd with good motion consistency.

2.2.

Particle Flow Field

Particle flow field is based on the Lagrangian fluid dynamics framework. When used for motion segmentation, the particle trajectory can be estimated by numerical integration through the moving particle grid in the optical flow field. For example, Ali and Shah22 first used the Lagrangian particle dynamic (LPD) algorithm to estimate the particle flow field based on the optical flow field, and then used the finite time Lyapunov exponent to extract the boundary among different flow fields to obtain the segmentation results for a crowd. Because a particle flow field ignores the spatial domain variation and the time delay is obvious, a crowd motion segmentation method based on particle flow field is only suitable for a high-density slow-moving crowd.

2.3.

Streak Flow Field

To obtain a better flow-based model for a moving crowd, Mehran et al.24 introduced streak line in fluid dynamics to estimate the motion field of a crowd, which is called streak flow. Because streak flow can preserve the motion information of the flow for a period, the segmentation performance is better than particle flow when applied to crowds with obvious motion changes. However, the streak flow field is obtained on the basis of the optical flow field. The calculation process of streak flow requires very high accuracy optical flow; however, it is very difficult to meet accuracy requirements for optical flow using conventional methods.

2.4.

Vector Domain Segmentation for Flow Field Model Application

To make the flow field model suitable for crowds with different densities and motion modes, some researchers have used the vector domain segmentation model36,37 in the vector field to establish a flow-based model of a moving crowd, such as the LTDS model proposed by Wu and Wong20 Through the normalization of the vector magnitude and the addition of foreground area constraints, LTDS extends the TDS model proposed by Roy et al.36 to the nonunit field; it can be applied to the segmentation of crowds with different densities by localization. However, because the objective function adopted by LTDS only considers the direction difference between vectors, undersegmentation between foreground and background can easily occur.

The proposed crowd motion segmentation method is based on a segmentation model in the optical flow field. The ICS measure is used to replace the similarity measure based on normalized vector inner product in the objective function of LTDS. The improved objective function can distinguish between both magnitude and direction of vectors to achieve better segmentation performance.

3.

Method

The overall architecture of the crowd motion segmentation method proposed in this study is shown in Fig. 1. For applying the algorithm to a high-density slow-moving crowd: (1) the optical flow field between two consecutive frames is first obtained by means of an optical flow extraction algorithm (such as the Brox algorithm in Ref. 38); (2) the local ICS is used to obtain the local motion consistency map (LMCM); (3) the initial regions for level set evolution are extracted from the LMCM; (4) through the level set algorithm, the objective function of ICS-based LTDS is used to complete the region evolution in the LMCM, and regions with multiple motion directions are obtained; (5) the final crowd segmentation results are obtained by merging the regions with same direction. In contrast, when the algorithm is used for crowds with medium- or low-densities and complex motion modes, after the initial region extraction step, region evolution is not performed in the LMCM but in the global motion consistency map (GMCM) corresponding to each initial region. The GMCM is obtained by calculating the ICS between the average vector of each initial region and all the vectors.

Fig. 1

Architecture of the proposed method.

JEI_28_2_023011_f001.png

3.1.

TDS and LTDS

Given a unit vector field E, the translation domain (TD) is defined as the region in which all vectors have the same direction. The field lines in a TD are a set of parallel lines. Therefore, there must exist a unique normal vector perpendicular to the field lines, denoted as a, which can be used as a characteristic parameter to represent the TD. a is also known as the dominant translation parameter. Roy et al.36 proved that a vector domain is a TD if and only if there is a unique vector a(Ω), for xΩ, that satisfies the following equation:

Eq. (1)

a(Ω)·E(x)=0.
According to Eq. (1), Roy et al.36 proposed a TDS model for vector domain segmentation. The corresponding objective function of the model is as follows:

Eq. (2)

J(τ)=Ω(τ)[a(τ)·E(x)]2dxμΩ(τ)dx,
where Ω is a TD; τ is an evolution parameter defined to facilitate the use of the active contour algorithm; [a(τ)·E(x)]2 is used to measure the motion consistency between all vectors in E and the TD determined by a(τ), which can be called the GMCM corresponding to a(τ); the integral term Ω(τ)[a(τ)·E(x)]2dx represents the sum of the squares of the errors in Ω in the GMCM, which reflects the motion consistency of Ω; the integral term Ωdx represents the area of Ω; and μ is a positive constant.

On the basis of Eq. (2), given a deformation velocity V, the Gâteaux derivative in the direction of V can be given as follows:

Eq. (3)

dJdτ(τ,V)=Ω(τ)[[a(τ)·E(x)]2μ]V(x)·Nτ(x)dl(x),
where Ω is the boundary of Ω, dl(x) is the integral variable of Ω, and Nτ is the unit normal vector of Ω pointing in the direction of the interior of Ω. Then, the minimization of Eq. (2) is performed to use the active contour algorithm to minimize Eq. (3); the corresponding evolution equation of the active contour algorithm is as follows:

Eq. (4)

{Cτ(τ)=[[a(τ)·E]2μ]NτC(τ=0)=C0,
where C is Ω, which is the contour of the TD for evolution, and C0 is its initial value.

Meanwhile, Roy et al.36 also proposed an estimation method for the dominant translation parameter a(τ), which takes the first half of the right side of Eq. (2) as the objective function and obtains the optimal estimate of a(τ) by minimizing the objective function, as follows:

Eq. (5)

a^(τ)=argmina[Ω(τ)[a(τ)·E(x)]2dx]=argmina{a(τ)[Ω(τ)E(x)E(x)Tdx]a(τ)T}=argmina[a(τ)Qτa(τ)T].
As Qτ is a real symmetric matrix, the optimal estimation a^(τ) of a(τ) can be determined as the eigenvector corresponding to the smallest eigenvalue of Qτ by the quadratic programming model.

Because the TDS model is proposed for the TD in the ideal unit vector field, it cannot be directly applied to the segmentation of an actual motion field. To make the TDS model applicable to crowd motion segmentation, Wu and Wong20 established a LTDS model by the normalization of the vector, addition of a foreground motion region area constraint, and localization according to Eq. (2). The objective function of LTDS is as follows:

Eq. (6)

J(A,τ)=Ω(τ)F[A(x),x]dxμΩ(τ)G(U(x))dx,F[A(x),x]=1|(x)|(x)(A(x)·U(x^)U(x^))2dx^,G[U(x)]=11+(γU(x))2,
where Ω is a local TD; U is a motion field and its magnitude U is mapped to the range [0, 1]; (x) is a neighborhood centered on x and |(x)| is the area of (x); A(x) is the dominant translation parameter of (x) and can be estimated according to Eq. (5); and F[A(x),x] is the mean of the sum of the squared errors inside (x), reflecting the local motion consistency. Given the size of (x), we can calculate F[A(x),x] of each vector in U and use them to form an LMCM, which can better describe the consistent slow motion of high-density crowds than GMCM; γ is a constant; G[U(x)] is proportional to U, and is used to measure the extent to which x belongs to the foreground, so that μΩ(τ)G(U(x))dx can be considered as the foreground area constraint.

Similar to the evolution equation [Eq. (4)] of TDS, and considering that F[A(x),x] is an even function, which makes it impossible to distinguish two completely opposite vectors, the evolution equation of the active contour algorithm corresponding to the LTDS model is as follows:

Eq. (7)

K(x)={F[A(x),x]μG[U(x)],if  U(x)·U¯(x)Ω>0F[A(x),x]μG[U(x)],if  U(x)·U¯(x)Ω0,{Cτ(τ)=K(x)·NτC(τ=0)=C0,
where U¯(x)Ω is the average vector of the region (x)Ω.

3.2.

ICS

It is well known that the cosine similarity (CS) measure uses the cosine of the angle between two vectors in the vector space to measure the difference between these two vectors:

Eq. (8)

CS(x,y)=cosθxy=x·yx·y,
where θxy is the angle between vectors x and y. The closer the cosine of the angle is to 1, which indicates that the angle will be close to 0, the more similar the two vectors will be. However, as the CS normalizes the magnitudes of the two vectors during calculation, it is impossible to measure the difference between the magnitudes of the two vectors.

To measure both magnitude and direction difference between the two vectors using CS, we first add a scale factor that can measure the difference between the magnitudes of the two vectors in the original CS, as shown in Eq. (8); this is defined as follows:

Eq. (9)

kM(x,y)=min(x,y)max(x,y).

Considering that the range of kM(x,y) is [0, 1] and that of CS is [1,1], to unify the ranges of both values, by linear mapping, the CS is adjusted to

Eq. (10)

CS1(x,y)=12(1+cosθxy)=12(1+x·yx·y).
Based on Eqs. (9) and (10), the ICS is given as follows:

Eq. (11)

ICS(x,y)=CS1(x,y)·kM(x,y)=12(1+x·yx·y)·min(x,y)max(x,y)=x·y+x·y2·max2(x,y).

The basic ICS defined in Eq. (11) cannot be directly used for TD segmentation because there is an imbalance when evaluating the magnitude and direction difference. For example, if the similarity is given as 0.5, when kM1 (i.e., there is almost no difference in magnitude between the two vectors) and cosθxy0 can be obtained from Eqs. (10) and (11), then θxyπ/2; however, when θxy0 (i.e., there is almost no difference in direction between the two vectors), CS1(x,y)1, then kM0.5. This shows that the ICS between the two vectors with small difference in direction and half the difference in magnitude is the same as the ICS between the two vectors that are perpendicular to each other and have the same magnitude, as shown in Fig. 2(a). Considering that there are two vectors perpendicular to a given vector in the same plane, ICS(x,y1)=ICS(x,y2)=ICS(x,y3).

Fig. 2

Imbalance of vectors with the same ICS defined in Eq. (11). (a) ICS between the two vectors with no difference in direction but half the difference in magnitude [i.e., ICS(x,y1)] is the same as the ICS between the two vectors perpendicular to each other with same magnitude [i.e., ICS(x,y2) and ICS(x,y3)]. (b) An optical flow field containing the optical flow vectors having the relationship described in panel (a) (the vector in red is the given reference vector x).

JEI_28_2_023011_f002.png

For the actual application of this study, ICS(x,y1), ICS(x,y2), and ICS(x,y3) should not be the same. Although the magnitude of y1 is quite different from that of x, they can be considered to have good similarity because they have the same directional angle (i.e., they have exactly the same direction of motion). In contrast, both y2 and y3 are perpendicular to x, resulting in a large difference in their direction of motion. In this case, even if they have the same magnitude, the similarity between them should be very low. Otherwise, two vectors with vertical or even opposite directions may be segmented into the same region, i.e., all the vectors in Fig. 2(b) will be segmented into the same region. That is, when segmenting vectors, the importance of the similarity between the directional angles is significantly higher than the similarity between the magnitudes; for the case shown in Fig. 2, the condition ICS(x,y1)ICS(x,y2)=ICS(x,y3) should be satisfied. To enhance the proportion of directional angle similarity in the basic ICS defined in Eq. (11), we add an exponential adjustment factor β to CS1(x,y) that measures the directional angle difference in ICS; then ICS is redefined as follows:

Eq. (12)

ICS(x,y,β)=[CS1(x,y)]β·kM(x,y)=[12(1+x·yx·y)]β·min(x,y)max(x,y),
where β>1. The larger β is, the greater will be the proportion of the directional angle similarity in ICS. Meanwhile, if the minimum similarity smin is given, i.e., ICS(x,y,β)smin, the larger β is, the smaller will be the variation range (defined as Rθβ) of the angle difference θxy, as shown in Fig. 3.

Fig. 3

Relationship between Rθβ and β. When the minimum similarity smin is given, i.e., ICS(x,y,β)smin, the larger β is, the smaller will be Rθβ.

JEI_28_2_023011_f003.png

The value of β can be determined according to the distribution of the vectors in the optical flow field, especially the distribution of the vector direction by using the maximum allowable Rθβ at minimum similarity smin. For example, the minimum similarity smin is given as 0.5 and Rθβ is given as π/6 at this similarity [as shown in Fig. 4(a)]; when kM=1, [CS1(x,y)]β takes the minimum value of 0.5 and the corresponding direction angle θxy has the maximum value of π/6, so the following equation holds as follows:

Eq. (13)

[CS1(x,y)]β=[12(1+cosθxy)]β=[12(1+cosπ6)]β=12.
By solving Eq. (13), we can obtain β=9.9969. Once β is determined, for a given minimum similarity smin=0.5, θxy is limited to π/6 irrespective of the change in kM, as shown in the shaded area in Fig. 4(b). In other words, when β=9.9969, in the limit case, the ICS with θxy=0 and kM=0.5 is the same as the ICS with θxy=π/6 and kM=1, i.e., ICS(x,y1,9.9969)=ICS(x,y2,9.9969)=ICS(x,y3,9.9969) [as shown in Fig. 4(a)]; thus, this avoids extreme imbalance between the magnitude difference and the direction difference shown in Fig. 2. In this manner, when using the ICS to perform the segmentation of a TD, if the minimum similarity smin=0.5 is given, the maximum directional angle difference among all vectors in the TD is restricted to 2Rθ9.9969=2×(π/6)=π/3, and the normalized maximum magnitude difference is limited to [0.5,1].

Fig. 4

Determination of β and its effect given smin=0.5 and Rθβ=π/6 corresponding to smin. (a) From ICS(x,y1,β)=ICS(x,y2,β)=ICS(x,y3,β), β=9.9969 can be determined. (b) After β is determined, the values of kM and θxy are both limited to the shaded area.

JEI_28_2_023011_f004.png

3.3.

ICS-Based TDS and ICS-Based LTDS

It can be seen from Eqs. (2) and (6) that F[A(x),x] in the objective function of LTDS is similar to the term Ω(τ)[a(τ)·E(x)]2dx in the objective function of TDS; they essentially use the CS to measure whether the vector to be segmented is perpendicular to the dominant translation parameter a(Ω) of Ω. In the actual application process, when Ω is constantly evolving, the degree of its unit TD approximation becomes worse; the difference between the magnitudes of the vectors may also increase. In this case, if the magnitude difference is not equalized due to improper selection of μ, undersegmentation may occur, i.e., regions with vectors having small direction difference but large magnitude difference may be segmented into one region, as shown in Fig. 5. In an actual motion field, the difference between magnitude is often an important basis for distinguishing the foreground from the background; however, the error constraints in the objective function of both TDS and LTDS do not consider the magnitude difference between vectors, which causes the segmentation results to include more background in the motion regions.

Fig. 5

Undersegmentation caused by the magnitude difference between foreground and background. In the left, the vectors of foreground (gray shaded region) and background (green shaded region) have no difference in terms of direction, but they have obvious difference in magnitude. When the initial region Ω of the active contour algorithm is inside Ω1 (as indicated by the red dashed line on the left), if the magnitude difference is not equalized due to improper selection of μ, the evolution result Ω (the area within the red border on the right) will contain both the foreground and the background, i.e., Ω=Ω1Ω2.

JEI_28_2_023011_f005.png

Because ICS can measure the difference between both direction and magnitude of vectors, it is expected to improve the segmentation performance by improving the objective function of LTDS and TDS. The key is to use ICS to redefine the error constraint term in the objective function of LTDS and TDS.

Referring to Eq. (2), the objective function of the TDS redefined by ICS (ICS-TDS) is as follows:

Eq. (14)

J(τ)=Ω(τ)(1ICS(b(τ),U(x),β))2dxμΩ(τ)G(U(x))dx=Ω(τ)(1(12(1+b(τ)·U(x)b(τ)·U(x)))β·min(b(τ),U(x))max(b(τ),U(x)))2dxμΩ(τ)G(U(x))dx,
where b(Ω) is the same vector as the field line of the TD Ω. Similar to a(Ω), b(Ω) can also be used as the dominant translation parameter to represent Ω. However, unlike a(Ω), because the magnitude of b(Ω) is involved in the minimum and maximum calculation, its optimal estimation cannot be obtained by the quadratic programming algorithm. However, we can use the average vector U¯Ω of all vectors in Ω to approximate b(Ω) according to the definition of b(Ω), i.e.,

Eq. (15)

b^(τ)=U¯Ω=1|Ω(τ)|Ω(τ)U(x)dx.
Although b(Ω) does not obtain its optimal estimation by Eq. (15), its acquisition process is greatly simplified compared to a(Ω); its time complexity is significantly reduced from O(|Ω|3) to O(|Ω|).

After defining the objective function of ICS-TDS given by Eq. (14), referring to the localization method in Eq. (6), we can obtain the objective function of ICS-based LTDS (ICS-LTDS) by redefining F(A(x),x) with ICS:

Eq. (16)

J(τ)=Ω(τ)FICS[B(τ,x),x]dxμΩ(τ)G(U(x))dx,FICS[B(τ,x),x]=1|(x)|(x)(1ICS(B(τ,x),U(x^),β))2dx^,G(U(x))=11+(γU(x))2,
where FICS[B(τ,x),x] is the mean of the sum of the square of ICS-based errors in the neighborhood (x), which can be used to form the ICS-based LMCM; the definitions of the other parameters are the same as those in Eq. (6). It should be noted that the purpose of localization is to avoid the global parameter a(Ω) or b(Ω) from affecting the error estimation in the local range such that the estimation of B(τ,x) in FICS[B(τ,x),x] should not be the average vector U¯Ω(τ) of Ω(τ) defined by Eq. (15), but the average vector U¯(x) of (x) as follows:

Eq. (17)

B^(τ,x)=U¯(x)=1|(x)|(x)U(x^)dx^.

Similar to TDS and LTDS, the minimization of the objective functions in Eqs. (14) and (16) can also be performed by the active contour algorithm. The evolution equation of ICS-TDS can be obtained by rewriting Eq. (4) as follows:

Eq. (18)

{Cτ(τ)=((1(12(1+b(τ)·U(x)b(τ)·U(x)))β·min(b(τ),U(x))max(b(τ),U(x)))2μG)NτC(τ=0)=C0

As FICS[B(τ,x),x] is not an even function, opposite direction vectors can be well distinguished. Therefore, the evolution equation of the active contour algorithm corresponding to ICS-LTDS can be rewritten by referring to Eq. (7):

Eq. (19)

{Cτ(τ)=[FICS[B(τ,x),x]μG[U(x)]]·Nτ=[FICS(τ)μG)]·NτC(τ=0)=C0.

3.4.

Acquisition of Initial Regions

As we know, for the active contour algorithm, the selection of the initial regions has a significant impact on the final segmentation results. The segmentation results obtained by erroneous initial regions may be intertwined with the correct segmentation results; therefore, they cannot be eliminated. To obtain suitable initial regions of evolution, Wu and Wong20 first selected the minimum point p in the LMCM and then calculated the mean F¯p of F and the mean optical flow magnitude M¯p in the neighborhood of p. Only the neighborhood of a point with F¯p and M¯p higher than given thresholds, which indicates that the motion consistency in its neighborhood is good enough and the possibility that its neighborhood is the foreground is sufficiently high, can be considered as the initial region. However, when this initial region is used for medium- or low-density crowds, if the optical flow estimation error leads to some larger magnitudes of optical flow in the background, some erroneous initial regions may be selected in the background, as shown in Fig. 6(b). In Fig. 6(b), the initial regions in the red circle are all erroneous initial regions.

Fig. 6

Example of the acquisition of initial regions. (a) The minimum (red points) extraction results in LMCM; (b) The initial regions extracted by F¯p and M¯p, where the initial regions within the red circle are the erroneous initial regions in the background; and (c) The initial regions extracted after the introduction of D¯p, where all the erroneous initial regions in (b) are removed.

JEI_28_2_023011_f006.png

Although the magnitudes of optical flow in these erroneous regions are relatively large, the gray level changes in these background regions are actually small and can be easily estimated by the frame difference method. Therefore, in addition to F¯p and M¯p, we introduce the regional mean D¯p based on the frame difference for initial region determination. Only when F¯p, M¯p, and D¯p are all higher than given thresholds, which are, respectively, called local error threshold THF¯, local motion magnitude threshold THM¯, and local frame difference magnitude threshold THD¯, the neighborhood of the minimum point p can be confirmed as the initial region. The initial regions extracted after the introduction of D¯p can be seen in Fig. 6(c). By comparing Figs. 6(b) and 6(c), it can be seen that most of the erroneous initial regions in the background are removed.

3.5.

Algorithm Flow of the Proposed Method

As mentioned above, ICS-based LMCM is more suitable for segmenting high-density crowds with slow consistent motion because it is based on the local ICS error constraints that can well reflect the motion consistency between local regions. In contrast, ICS-based GMCM is based on the ICS error constraints between all vectors in the motion field and a given TD; therefore, it is more suitable for segmenting motion regions that are consistent with the given TD. If the initial regions contain all possible motion modes, the GMCMs corresponding to all the initial regions can be used to segment the moving crowd with medium- or low-density and complex motion modes.

In summary, the detailed flow of the proposed crowd motion segmentation algorithm based on ICS-LTDS/ICS-TDS is shown in Algorithm 1.

Algorithm 1

Proposed algorithm.

Input: Two consecutive frames in a video stream including moving crowd (I1,I2), crowding level (L, high or low), and thresholds of F¯p, M¯p and D¯p (THF¯, THM¯, THD¯)
Output: Set of segments with directions ({Ω1,Ω2,Ωn})
(1) Calculating the optical flow field U between I1 and I2 by means of an optical flow extraction algorithm
(2) Calculating the LMCM LM based on local ICS
(3) Extracting the set of all the minimum points P={p1,p2,pk} in LM
(4) Calculating F¯p, M¯p, and D¯p of each point in P
(5) for i=1 to k
(6)  if F¯p(i)<THF¯M¯p(i)<THM¯D¯p(i)<THD¯
(7)   Delete pi from P
(8)  end if
(9) end for
(10) Constructing a set of the initial regions {C01,C02,,C0m} by the undeleted points {p1,p2,pm} in P
(11) if L = = “high” //crowd has high density and slow motion
(12)  for j=1 to m
(13)   Region evolution through the level set algorithm based on ICS-LTDS-based evolution equation [Eq. (19)] with the initial region C0j in LM
(14)  end for
(15)  Regions with multiple motion directions {C1,C2,,Cm} are obtained
(16) else //crowd has medium- or low-density and complex motion mode
(17)  for j=1 to m
(18)   Calculating the GMCM GMj corresponding to C0j based on ICS
(19)   Region evolution through the level set algorithm based on ICS-TDS based evolution equation [Eq. (18)] with the initial region C0j in GMj
(20)  end for
(21)  Regions with multiple motion directions {C1,C2,,Cm} are obtained
(22) end if
(23) {Ω1,Ω2,Ωn} is achieved by merging the regions with same direction in {C1,C2,,Cm}.

4.

Experiments and Discussion

In this study, two series of experiments were conducted to verify the performance of Algorithm 1. In the first series of experiments, the proposed ICS-LTDS algorithm was applied for the segmentation of high-density slow-moving crowds. In the second series of experiments, the proposed ICS-TDS algorithm was used to segment medium- or low-density crowds with complex motion modes. Most crowd video/image sequences used in the experiments were taken from two public datasets, i.e., UCF dataset22 and UCSD dataset,28 and a small part from YouTube ( https://www.youtube.com/). In addition, methods based on flow field models, specifically, the LTDS algorithm,20 TDS algorithm,36 LPD algorithm,22 streak flow-based algorithm,24 and the prior knowledge-based trajectory tracking algorithm,16 and recently proposed typical nonflow-based methods, specifically, the CF algorithm32 and TCA,34 were applied to the same dataset for performance comparison. During the experiments, all algorithms were run on the MATLAB 8.6 platform in the Windows 10 Pro environment. To evaluate the segmentation performance of the algorithm, the ground truth was set manually for some representative images. Meanwhile, on the basis of the ground truth, the numerical segmentation accuracy (SA) corresponding to different directions was given by means of the Jaccard similarity coefficient39 among sets. The numerical SA is calculated as follows:

Eq. (20)

SAi(Ω,T)=|ΩiTi||ΩiTi|·100%=|ΩiTi||Ωi|+|Ti||ΩiTi|·100%,
where i represents the index of the regions corresponding to different motion directions, Ωi is the segmentation result corresponding to i, and Ti is the ground truth region corresponding to i.

4.1.

Segmentation of High-Density Slow-Moving Crowds

Algorithm 1 is applied for segmentation of high-density slow-moving crowds, as described in Sec. 3.5, in the LMCM by using the evolution equation [Eq. (19)] based on the objective function of ICS-LTDS (β=9.9969). In all experiments, if the size of the entire image is mapped to [0,1]×[0,1], the size of the neighborhood (x) is 0.02×0.02. The initial region of evolution is centered on the minimum of the LMCM and the size is 5pixels×5  pixels. The thresholds THF¯ and THM¯ are set to 0.06 and 2, respectively. When the threshold THD¯ is used for different image sequences, the set values are between 1.3 and 3.4. The window size for calculating F¯p, M¯p, and D¯p is 5pixels×5  pixels. The constant γ in G is set to 0.1. The constant μ is given by referring to LTDS:20

Eq. (21)

μ=sin2(Δθ2)F¯G¯,
where Δθ is the angle difference between the two regions to be segmented, which can be given first by Rθβ, and adjusted according to the actual situation; F¯ and G¯ are the average values of F and G in the initial region. For the settings of LTDS, parameters that are similar to those of ICS-LTDS, such as the size of (x), THF¯, THM¯, γ, and μ, are all set to the same values as in ICS-LTDS for performance comparison. The representative segmentation results of the flow field model-based methods (i.e., ICS-LTDS, LTDS, LPD, streak flow, and prior knowledge-based trajectory tracking) for four image sequences of high-density crowds with slow motion are shown in Fig. 7. The SA calculated according to Eq. (20) is shown in Table 1.

Fig. 7

Image sequences of high-density slow-moving crowds used for testing: (a) Mecca, (b) pilgrims, (c) marathon, and (d) kabba. Representative segmentation results (top to bottom): Ground truth images, segmentation results of the proposed ICS-LTDS, segmentation results of LTDS, segmentation results of LPD, segmentation results of streak flow, and segmentation results of prior knowledge-based trajectory tracking.

JEI_28_2_023011_f007.png

Table 1

Segmentation accuracies of the proposed ICS-LTDS, LTDS, LPD, streak flow, and prior knowledge-based trajectory tracking for the scenes in Fig. 7.

Test sequenceProposed ICS-LTDSLTDS20LPD22Streak flow24Prior knowledge-based trajectory tracking16
RedGreenAverageRedGreenAverageRedGreenAverageRedGreenAverageRedGreenAverage
Mecca0.61040.61040.59220.59220.58460.58460.36230.36230.23760.2376
Pilgrims0.42000.36930.39470.40860.35020.37940.43890.42290.43090.39780.40430.40110.32560.24890.2873
Marathon0.52750.52750.45270.45270.49370.49370.47420.47420.41590.4159
Kabba0.85350.65530.75440.86710.63220.74970.89030.77920.83480.49400.57690.53550.58810.30580.4470
Note: Bold values indicate the maximum of the average accuracy of the different methods for the same test sequence.

To compare the segmentation results of the proposed ICS-LTDS algorithm with those of the typical nonflow-based methods in scenarios with high-density slow-moving crowds, we selected pilgrims and marathon as test sequences, and applied the proposed ICS-LTDS, TCA,34 and the CF-based algorithm32 to them. Representative segmentation results obtained from them are shown in Fig. 8 and the segmentation accuracies calculated according to Eq. (20) are shown in Table 2. For TCA, the following parameter settings were used: p=16, r=40 (10 for pilgrims), τ=30, Δ=100, δ=1, α=0.1, β=20, and γ=150; for the CF algorithm, K=20, z=0.025, the upper bound of ϕ was taken as 1, and the threshold coefficient α was 0.5.

Fig. 8

Image sequences of high-density slow-moving crowds used for testing (top to bottom): pilgrims and marathon. Representative segmentation results: (a) ground truth images; (b) segmentation results of the proposed ICS-LTDS; (c) segmentation results of TCA; and (d) segmentation results of CF algorithm.

JEI_28_2_023011_f008.png

Table 2

Segmentation accuracies of the proposed ICS-LTDS, TCA, and CF algorithm for the scenes in Fig. 8.

Test sequenceProposed ICS-LTDSTCA34CF algorithm32
RedGreenAverageRedGreenAverageRedGreenAverage
Pilgrims0.42000.36930.39470.30290.27990.29140.34300.32160.3323
Marathon0.52750.52750.43040.43040.45570.4557
Note: Bold values indicate the maximum of the average accuracy of the different methods for the same test sequence.

Similar to crowd flow in high-density slow-moving crowds, the proposed ICS-LTDS algorithm can also be applied to slow-moving traffic flow or mixed flow. We selected two image sequences for algorithm evaluation in this noncrowd scenario. In addition, flow-based ICS-LTDS, LTDS, and LPD, and nonflow-based TCA were selected for the evaluation and comparison in the same noncrowd scenario. Representative segmentation results for two image sequences of noncrowd scenes are shown in Fig. 9. The segmentation accuracies calculated according to Eq. (20) are shown in Table 3.

Fig. 9

Image sequences of noncrowd scenes used for testing (top to bottom): roundabout and highway traffic. Representative segmentation results: (a) ground truth images; (b) segmentation results of the proposed ICS-LTDS; (c) segmentation results of LTDS; (d) segmentation results of LPD; and (e) segmentation results of TCA.

JEI_28_2_023011_f009.png

Table 3

SA of the proposed ICS-LTDS, LTDS, LPD, and TCA for the scenes in Fig. 8.

Test sequenceProposed ICS-LTDSLTDS20LPD22TCA34
RedGreenAverageRedGreenAverageRedGreenAverageRedGreenAverage
Roundabout0.38180.38460.38320.30720.34640.32680.35220.39480.37750.26280.31880.2908
Highway traffic0.38310.47530.42920.25950.40900.33430.29380.35440.32410.17860.15530.1670
Note: Bold values indicate the maximum of the average accuracy of the different methods for the same test sequence.

Further, we obtained the ground truth of 100 consecutive frames from the marathon sequence manually to perform evaluation on a sequence containing a slow-moving crowd. The segmentation results of the representative five consecutive frames of the three best-performing flow-based algorithms (i.e., the proposed ICS-LTDS, LTDS,20 and LPD22) and the nonflow-based CF algorithm32 are shown in Fig. 10. The highest, lowest, and average segmentation accuracies of all 100 frames in the entire sequence are shown in Table 4.

Fig. 10

Segmentation results of the representative five consecutive frames (top to bottom) in the marathon sequence: (a) ground truth; (b) segmentation results of the proposed ICS-LTDS; (c) segmentation results of LTDS; (d) segmentation results of LPD; and (e) segmentation results of CF algorithm.

JEI_28_2_023011_f010.png

Table 4

Highest, lowest, and average segmentation accuracies of all 100 frames in the Marathon sequence.

Test sequenceProposed ICS-LTDSLTDS20LPD22CF algorithm32
MaximumMinimumAverageMaximumMinimumAverageMaximumMinimumAverageMaximumMinimumAverage
Marathon0.58640.50710.54620.55300.44590.49390.56030.49020.52260.49910.45010.4680
Note: Bold values indicate the maximum of the average accuracy of the different methods for the same test sequence.

From the results in Figs. 710 and Tables 14, the following are clear: (1) As the principles of particle flow and streak flow are similar, both process motion information recorded for a period; therefore, the segmentation results of LPD and streak flow are similar. The proposed ICS-LTDS is an improvement of LTDS and so the segmentation results of these two methods are similar. (2) Although particle flow and streak flow are highly suitable for segmenting slow-moving targets, when the speed of the target has a downward trend and tends to stop, both methods will assume that the target has not moved for a period and thus abandon the extraction of the target. For example, in the highway traffic sequence (last row in Fig. 9), the traffic that decelerates and gradually stops because of congestion is not segmented by LPD. In contrast, as ICS-LTDS and LTDS are based on real-time optical flow, both segment these targets. (3) As the prior knowledge-based trajectory tracking method is used in the sparse optical flow field formed by feature points, it is only suitable for the motion segmentation of people in extremely crowded places. For a scene where the feature points are not easily extracted or the crowd has looser distribution, missed segmentation is highly possible, as shown in the last row of Figs. 7(a)7(d). (4) From the discussion in Sec. 3.3, as the objective function of LTDS only distinguishes the direction differences between vectors regardless of the magnitude differences, it is prone to produce undersegmentation results between foreground and background. This phenomenon is particularly evident in the segmentation results for the roundabout and highway traffic sequences in Fig. 9(c), and the marathon sequence in Fig. 10(c), which results in a lower SA than ICS-LTDS with improved objective function. (5) As TCA needs to record the complete trajectories of the keypoints, it cannot cope with short-term changes of the moving regions. This makes it easy to produce undersegmentation between the foreground and background, as shown in Figs. 8(c) and 9(e). (6) Except for the pilgrims and kabba sequence, the SA of ICS-LTDS is higher than other methods, indicating that the ICS-LTDS algorithm is more suitable for segmentation of high-density slow-moving crowds than existing flow-based and nonflow-based crowd segmentation algorithms.

4.2.

Segmentation of Medium- or Low-Density Crowds with Complex Motion Modes

Algorithm 1 is applied to medium- or low-density crowds with complex motion modes, as described in Sec. 3.5, in the GMCM corresponding to each initial region by using the evolution equation [Eq. (18)] based on the objective function of ICS-TDS (β=9.9969). In all experiments, the average vector of the initial region is taken as b^(τ) to calculate the GMCM. Other parameters, such as THF¯, THM¯, THD¯, γ and μ, are all consistent with ICS-LTDS.

In LTDS, the evolution of the objective function is performed in LMCM, which is quite different from ICS-TDS. Therefore, in addition to LTDS, segmentation results of TDS, which also performs the evolution in the GMCM, are included for comparison. However, as the basic TDS proposed in Ref. 36 is only applicable to the unit vector field, it cannot be directly used in the actual scenes, as shown in Fig. 10. To make TDS available for crowd segmentation in the actual motion field, by referring to LTDS, we improve TDS by normalizing the motion vector and adding the foreground motion area constraint in the objective function of TDS:

Eq. (22)

J(τ)=Ω(τ)(a(τ)·U(x)U(x))2dxμΩ(τ)G(U(x))dx.
The TDS defined by Eq. (22) can be called the TDS with G, abbreviated as TDS-G. According to Eqs. (4) and (7), the evolution equation of TDS-G can be defined as follows:

Eq. (23)

{Cτ(τ)=((a(τ)·UU)2μG)NτC(τ=0)=C0

In addition, among other methods based on flow field modeling, the particle flow-based LPD is selected for algorithm comparison. To facilitate the comparison among algorithms, the common parameters of ICS-TDS, TDS-G, and LTDS are given the same values. The representative segmentation results of ICS-TDS, TDS-G, LTDS, and LPD applied to four image sequences are shown in Fig. 11, and the segmentation accuracies are shown in Table 5.

Fig. 11

Representative segmentation result. Image sequences used for testing (top to bottom): crosswalk, UCSD pedestrians, Fudan pedestrians, and multidirectional pedestrians (MD pedestrians). (a) Ground truth images; (b) segmentation results of proposed ICS-TDS; (c) segmentation results of TDS-G; (d) segmentation results of LTDS; and (e) segmentation results of LPD.

JEI_28_2_023011_f011.png

Table 5

SA of the proposed ICS-TDS, TDS-G, LTDS, and LPD for the scenes in Fig. 11.

Test sequenceProposed ICS-TDSTDS-G36LTDS20LPD22
RedGreenAverageRedGreenAverageRedGreenAverageRedGreenAverage
Crosswalk0.36320.37100.36710.30160.31700.30930.22260.28250.25260.20810.24230.2252
UCSD pedestrians0.43430.47170.45300.33600.44630.39120.32240.46760.39500.36390.52730.4456
Fudan pedestrians0.46280.31910.39100.35730.29700.32720.33860.14220.24040.38840.32910.3588
MD pedestrians0.59280.55540.49590.4049
Note: Bold values indicate the maximum of the average accuracy of the different methods for the same test sequence.

It should be noted that in a close-up scene containing multidirectional pedestrians as shown in the fourth row of Fig. 10, the term Ti in Eq. (20) is difficult to estimate due to the numerous moving directions of the pedestrians, mutual occlusion, and even differences in moving directions of the different parts of the same body, which makes it impossible to calculate the SA separately for different directions according to Eq. (20). Considering that the SA should simultaneously measure the similarity between both area and direction of the segment and the ground truths, on the basis of Eq. (20), we add the direction consistency measure between Ωi and ΩiT, so that the mean of the SA for different directions can be directly calculated without segmenting the ground truth T. The SA after adding the direction consistency measure is as follows:

Eq. (24)

Aavg(Ω,T)=ΩiΩ|ΩiT|·Sφ(Ωi,ΩiT)|ΩT|·100%=ΩiΩ|ΩiT|·Sφ(Ωi,ΩiT)|(Ωi)T|·100%,
where Sφ(Ωi,ΩiT) is the direction consistency measure between Ωi and ΩiT, and its value should be mapped to [0,1] so that it can be calculated by means of CS as follows:

Eq. (25)

Sφ(Ωi,ΩiT)=1|ΩiT|j(ΩiT)12[1+CS[bi,xT(j)]],
where bi is the average vector of the segmented region Ωi and xT(j) is the vector of the ground truth T at location j. Substitute Eq. (25) into Eq. (24) to get the final form of Aavg, which is used to calculate the SA of the results in the fourth row in Fig. 10:

Eq. (26)

Aavg(Ω,T)=ΩiΩj(ΩiT)12[1+CS[bi,xT(j)]]|(Ωi)T|·100%.

To compare the segmentation results of the proposed ICS-LTDS algorithm with those of the typical nonflow-based methods in crowds with medium- or low-density and complex motion modes, we selected the crosswalk and MD pedestrians as test sequences, and applied the proposed ICS-LTDS, TCA,34 and the CF-based algorithm32 to them. Representative segmentation results are shown in Fig. 12 and the segmentation accuracies calculated according to Eq. (20) [Eq. (26) for MD pedestrians] are shown in Table 6. For TCA, the parameter settings were as follows: p=16, r=40, τ=30, Δ=100, δ=1, α=0.1, β=20 and γ=150; for the CF algorithm, K=20, z=0.025, the upper bound of ϕ was taken as 1, and the threshold coefficient α as 0.5.

Fig. 12

Representative segmentation result. Image sequences used for testing (top to bottom): crosswalk and MD pedestrians. (a) Ground truth images; (b) segmentation results of proposed ICS-TDS; (c) segmentation results of TCA; and (d) segmentation results of CF algorithm.

JEI_28_2_023011_f012.png

Table 6

Segmentation accuracies of the proposed ICS-TDS, TCA, and CF algorithm for the scenes in Fig. 12.

Test sequenceProposed ICS-TDSTCA34CF algorithm32
RedGreenAverageRedGreenAverageRedGreenAverage
Crosswalk0.36320.37100.36710.15830.17060.16450.30720.36550.3364
MD pedestrians0.59280.39240.3655
Note: Bold values indicate the maximum of the average accuracy of the different methods for the same test sequence.

Similar to Sec. 4.1, we obtained the ground truth of the 100 consecutive frames in the crosswalk sequence manually to evaluate sequences containing fast-moving crowds with complex motion modes. The segmentation results of the representative five consecutive frames of the three best-performing flow-based algorithms (i.e., the proposed ICS-TDS, TDS-G,36 and LPD22) and the nonflow-based CF algorithm32 are shown in Fig. 13. The highest, lowest, and average segmentation accuracies of all 100 frames in the entire sequence are shown in Table 7.

Fig. 13

Segmentation results of the representative five consecutive frames (top to bottom) in the crosswalk sequence: (a) ground truth; (b) segmentation results of proposed ICS-TDS; (c) segmentation results of TDS-G; (d) segmentation results of LPD; and (e) segmentation results of CF algorithm.

JEI_28_2_023011_f013.png

Table 7

Highest, lowest, and average segmentation accuracies of all 100 frames in the crosswalk sequence.

Test sequenceProposed ICS-TDSTDS-G36LPD22CF algorithm32
MaximumMinimumAverageMaximumMinimumAverageMaximumMinimumAverageMaximumMinimumAverage
RedGreenRedGreenRedGreenRedGreenRedGreenRedGreenRedGreenRedGreen
Crosswalk0.3740.3730.3130.3190.3450.3360.3200.2050.2510.2720.3120.2860.1830.1550.2490.3350.3730.2960.2830.319
Note: Bold values indicate the maximum of the average accuracy of the different methods for the same test sequence.

From the results shown in Figs. 1113 and Tables 57, the following is clear: (1) The particle flow-based LPD algorithm is not suitable for segmenting crowds with loose distribution, especially for segmenting crowds with complex motion modes. For example, in the segmentation results of crosswalk, as shown in the first row in Fig. 11(e) and the last row in Fig. 13(d), there are obvious missing or incorrect segmentations. Meanwhile, in the segmentation results of multidirectional pedestrians, several pedestrians with distinctly different motion directions are segmented in the same region, resulting in severe undersegmentation [as shown in Fig. 14, which is the zoomed-in version of the last row of Fig. 11(e)]. (2) As LMCM is based on the motion consistency between local regions, when pedestrians with little difference in motion direction occlude each other, their motion boundaries are not clear enough in LMCM, which makes LTDS prone to produce undersegmentation results for pedestrians that occlude each other, as shown in Fig. 15 [the zoomed-in version of the last row of Fig. 11(d)]. (3) Although TDS-G and LTDS have almost the same objective function, TDS-G evolves in the GMCM while LTDS evolves in the LMCM. Except for UCSD pedestrians, the segmentation results of TDS-G are better than those of LTDS. It is indicated that the method based on the evolution in the GMCM is more suitable for segmenting medium- or low-density crowds with complex motion modes. As TDS-G has the same objective function as LTDS, which only distinguishes the vector direction but not the vector magnitude, the undersegmentation results between foreground and background caused by the objective function are still serious in the scene shown in Figs. 11 and 13; therefore, their SA is significantly lower than that of ICS-TDS. (4) As TCA cannot cope with short-term changes in the moving regions, when it is applied to crowd scenes containing complex motion patterns, the segmentation results still contain severe undersegmentation, as shown in Fig. 16(b) [zoomed-in version of the last row of Fig. 12(c)]. Meanwhile, as the coherent neighbor invariance mainly focuses on the pairwise motion consistency and ignores the motion difference among all the keypoints in the local region of the center point, when the CF algorithm is applied to crowd scenes containing complex motion patterns, undersegmentation results are easily produced, as shown in Fig. 16(c) [zoomed-in version of the last row of Fig. 12(d)]. (5) For the four scenes shown in Fig. 10, ICS-TDS is superior to other methods in terms of both segmentation effect and SA, indicating that the ICS-TDS algorithm is more suitable for segmenting medium- or low-density crowds with complex motion modes than existing flow-based and nonflow-based crowd motion segmentation algorithms.

Fig. 14

Undersegmentation results produced by LPD: (a) ground truth and (b) segmentation results of LPD. In panels (a) and (b), the same ellipse corresponds to the same pedestrians. In the ground truth, the pedestrians in blue and yellow ellipses have at least two motion directions, but in the segmentation results of LPD, the pedestrians in each ellipse have the same motion direction.

JEI_28_2_023011_f014.png

Fig. 15

Undersegmentation results produced by LTDS. (a) Ground truth and (b) segmentation results of LTDS. In panels (a) and (b), the same ellipse corresponds to the same pedestrian(s). In the ground truth, the pedestrians in blue, yellow, and red ellipses have two motion directions, but in the segmentation results of LTDS, the pedestrians in each ellipse have the same motion direction.

JEI_28_2_023011_f015.png

Fig. 16

Undersegmentation results produced by TCA and CF algorithm. (a) Ground truth; (b) segmentation results of TCA; and (c) segmentation results of CF algorithm. In panels (a), (b), and (c), the same ellipse corresponds to the same pedestrian(s). In the ground truth, the pedestrians in blue, yellow, and purple ellipses have at least two motion directions, but in the segmentation results of TCA and CF algorithm, the pedestrians in each ellipse have the same motion direction.

JEI_28_2_023011_f016.png

5.

Conclusion

In this study, a TDS model based on ICS was proposed; this model can be used to segment moving crowds with different crowding levels and complex motion modes. The method reconstructs the objective function of the TDS model by ICS so that the objective function can simultaneously measure the difference between both magnitude and direction of two vectors; thus, it can effectively avoid undersegmentation between the foreground and background due to the difference in magnitude. By switching between the “localization” and “globalization” modes of the objective function, i.e., performing the region evolution in LMCM or GMCM, the algorithm can be applied to segment crowds with different densities and motion states. In addition, by introducing the local motion magnitude threshold and local frame difference magnitude threshold simultaneously, the nonforeground regions can be excluded from the initial regions of evolution; thus, the undersegmentation and mis-segmentation results caused by erroneous initial regions are greatly reduced. The experimental results showed that for a variety of complex scenes containing moving crowds, the segmentation performance and accuracy of the proposed ICS-LTDS/TDS-based crowd motion segmentation method are superior to existing flow field-based motion segmentation methods.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant No. 61601156, 61701146, 61871164), and the Zhejiang Provincial Natural Science Foundation of China (Grant No. Q16F010019). The authors declare that they have no competing interests.

References

1. 

H. Chaudhry et al., “Crowd region detection in outdoor scenes using color spaces,” Int. J. Model. Simul. Sci. Comput., 9 (2), 1850012 (2018). https://doi.org/10.1142/S1793962318500125 Google Scholar

2. 

D. Peleshko et al., “Design and implementation of visitors queue density analysis and registration method for retail video surveillance purposes,” in Proc. IEEE First Int. Conf. Data Stream Min. Process. (DSMP), 159 –162 (2016). https://doi.org/10.1109/DSMP.2016.7583531 Google Scholar

3. 

J. Luo et al., “Real-time people counting for indoor scenes,” Signal Process., 124 27 –35 (2016). https://doi.org/10.1016/j.sigpro.2015.10.036 Google Scholar

4. 

M. Sajid, A. Hassan and S. A. Khan, “Crowd counting using adaptive segmentation in a congregation,” in Proc. IEEE Int. Conf. Signal and Image Process. (ICSIP), 745 –749 (2016). https://doi.org/10.1109/SIPROCESS.2016.7888363 Google Scholar

5. 

Z. Zhang, M. Wang and X. Geng, “Crowd counting in public video surveillance by label distribution learning,” Neurocomputing, 166 151 –163 (2015). https://doi.org/10.1016/j.neucom.2015.03.083 NRCGEO 0925-2312 Google Scholar

6. 

X. Hu et al., “A novel approach for crowd video monitoring of subway platforms,” Optik, 124 (22), 5301 –5306 (2013). https://doi.org/10.1016/j.ijleo.2013.03.057 OTIKAJ 0030-4026 Google Scholar

7. 

B. Sirmacek and P. Reinartz, “Automatic crowd density and motion analysis in airborne image sequences based on a probabilistic framework,” in Proc. IEEE Int. Conf. Comput. Vision Workshops (ICCV Workshops), 898 –905 (2011). https://doi.org/10.1109/ICCVW.2011.6130347 Google Scholar

8. 

A. G. Abuarafah, M. O. Khozium and E. AbdRabou, “Real-time crowd monitoring using infrared thermal video sequences,” J. Am. Sci., 8 (3), 133 –140 (2012). Google Scholar

9. 

T. Li et al., “Crowded scene analysis: a survey,” IEEE Trans. Circuits Syst. Video Technol., 25 (3), 367 –386 (2015). https://doi.org/10.1109/TCSVT.2014.2358029 Google Scholar

10. 

A. Dehghan et al., “Automatic detection and tracking of pedestrians in videos with various crowd densities,” Pedestrian and Evacuation Dynamics, 3 –19 Springer, Zurich, Switzerland (2012). Google Scholar

11. 

H. Ullah et al., “Density independent hydrodynamics model for crowd coherency detection,” Neurocomputing, 242 28 –39 (2017). https://doi.org/10.1016/j.neucom.2017.02.023 NRCGEO 0925-2312 Google Scholar

12. 

D. Zhang et al., “High-density crowd behaviors segmentation based on dynamical systems,” Multimedia Syst., 23 (5), 599 –606 (2017). https://doi.org/10.1007/s00530-016-0520-y MUSYEW 1432-1882 Google Scholar

13. 

W. Ge, R. T. Collins and R. B. Ruback, “Vision-based analysis of small groups in pedestrian crowds,” IEEE Trans. Pattern Anal. Mach. Intell., 34 (5), 1003 –1016 (2012). https://doi.org/10.1109/TPAMI.2011.176 ITPIDJ 0162-8828 Google Scholar

14. 

L. Dong et al., “Fast crowd segmentation using shape indexing,” in Proc. 11th IEEE Int. Conf. Comput. Vision, 1 –8 (2007). https://doi.org/10.1109/ICCV.2007.4409075 Google Scholar

15. 

M. Hu, S. Ali and M. Shah, “Learning motion patterns in crowded scenes using motion flow field,” in Proc. Int. Conf. Pattern Recognit., 1 –5 (2008). https://doi.org/10.1109/ICPR.2008.4761183 Google Scholar

16. 

L. Zhang et al., “Crowd segmentation method based on trajectory tracking and prior knowledge learning,” Arabian J. Sci. Eng., 43 7143 –7152 (2018). https://doi.org/10.1007/s13369-017-2995-z Google Scholar

17. 

D. Cremers and S. Soatto, “Motion competition: a variational approach to piecewise parametric motion segmentation,” Int. J. Comput. Vision, 62 (3), 249 –265 (2005). https://doi.org/10.1007/s11263-005-4882-4 IJCVEQ 0920-5691 Google Scholar

18. 

T. Brox et al., “Colour, texture, and motion in level set based segmentation and tracking,” Image Vision Comput., 28 (3), 376 –390 (2010). https://doi.org/10.1016/j.imavis.2009.06.009 IVCODK 0262-8856 Google Scholar

19. 

A. S. Rao et al., “Crowd event detection on optical flow manifolds,” IEEE Trans. Cybern., 46 (7), 1524 –1537 (2016). https://doi.org/10.1109/TCYB.2015.2451136 Google Scholar

20. 

S. Wu and H. S. Wong, “Crowd motion partitioning in a scattered motion field,” IEEE Trans. Syst. Man Cybern. Part B, 42 (5), 1443 –1454 (2012). https://doi.org/10.1109/TSMCB.2012.2192267 Google Scholar

21. 

Y. Yuan, J. Fang and Q. Wang, “Online anomaly detection in crowd scenes via structure analysis,” IEEE Trans. Cybern., 45 (3), 548 –561 (2015). https://doi.org/10.1109/TCYB.2014.2330853 Google Scholar

22. 

S. Ali and M. Shah, “A Lagrangian particle dynamics approach for crowd flow segmentation and stability analysis,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit. (CVPR), 1 –6 (2007). https://doi.org/10.1109/CVPR.2007.382977 Google Scholar

23. 

S. Mukherjee, D. Goswami and S. Chatterjee, “A Lagrangian approach to modeling and analysis of a crowd dynamics,” IEEE Trans. Syst. Man Cybern., 45 (6), 865 –876 (2015). https://doi.org/10.1109/TSMC.2015.2389763 Google Scholar

24. 

R. Mehran, B. E. Moore and M. Shah, “A streakline representation of flow in crowded scenes,” Lect. Notes Comput. Sci., 6313 439 –452 (2010). https://doi.org/10.1007/978-3-642-15558-1 LNCSD9 0302-9743 Google Scholar

25. 

X. Wang et al., “A high accuracy flow segmentation method in crowded scenes based on streakline,” Optik, 125 (3), 924 –929 (2014). https://doi.org/10.1016/j.ijleo.2013.07.166 OTIKAJ 0030-4026 Google Scholar

26. 

M. Gao et al., “Crowd motion segmentation and behavior recognition fusing streak flow and collectiveness,” Opt. Eng., 57 (4), 043109 (2018). https://doi.org/10.1117/1.OE.57.4.043109 Google Scholar

27. 

A. B. Chan and N. Vasconcelos, “Modeling, clustering, and segmenting video with mixtures of dynamic textures,” IEEE Trans. Pattern Anal. Mach. Intell., 30 (5), 909 –926 (2008). https://doi.org/10.1109/TPAMI.2007.70738 ITPIDJ 0162-8828 Google Scholar

28. 

V. Mahadevan et al., “Anomaly detection in crowded scenes,” in Proc. IEEE Conf. Comput. Vision and Pattern Recognit. (CVPR), 1975 –1981 (2010). https://doi.org/10.1109/CVPR.2010.5539872 Google Scholar

29. 

S. Duan, X. Wang and X. Yu, “Crowded abnormal detection based on mixture of kernel dynamic texture,” in Proc. Int. Conf. Audio, Language and Image Process., 931 –936 (2014). https://doi.org/10.1109/ICALIP.2014.7009931 Google Scholar

30. 

Y. Ma, P. Cisar and A. Kembhavi, “Motion segmentation and activity representation in crowds,” Int. J. Imaging Syst. Technol., 19 (2), 80 –90 (2009). https://doi.org/10.1002/ima.v19:2 IJITEG 0899-9457 Google Scholar

31. 

B. Zhou, X. Tang and X. Wang, “Coherent filtering: detecting coherent motions from crowd clutters,” Lect. Notes Comput. Sci., 7573 857 –871 (2012). https://doi.org/10.1007/978-3-642-33709-3 LNCSD9 0302-9743 Google Scholar

32. 

B. Zhou et al., “Measuring crowd collectiveness,” IEEE Trans. Pattern Anal. Mach. Intell., 36 (8), 1586 –1599 (2014). https://doi.org/10.1109/TPAMI.2014.2300484 ITPIDJ 0162-8828 Google Scholar

33. 

Z. Fan et al., “Adaptive crowd segmentation based on coherent motion detection,” J. Signal Process. Syst., 90 (12), 1651 –1666 (2018). https://doi.org/10.1007/s11265-017-1309-8 Google Scholar

34. 

R. Sharma and T. Guha, “A trajectory clustering approach to crowd flow segmentation in videos,” in Proc. IEEE Int. Conf. Image Process. (ICIP), 1200 –1204 (2016). https://doi.org/10.1109/ICIP.2016.7532548 Google Scholar

35. 

G. Doretto et al., “Dynamic textures,” Int. J. Comput. Vision, 51 (2), 91 –109 (2003). https://doi.org/10.1023/A:1021669406132 IJCVEQ 0920-5691 Google Scholar

36. 

T. Roy et al., “Segmentation of a vector field: dominant parameter and shape optimization,” J. Math. Imaging Vision, 24 (2), 259 –276 (2006). https://doi.org/10.1007/s10851-005-3627-x JIMVEC 0924-9907 Google Scholar

37. 

H. Li, W. Chen and I.-F. Shen, “Segmentation of discrete vector fields,” IEEE Trans. Visual Comput. Graphics, 12 (3), 289 –300 (2006). https://doi.org/10.1109/TVCG.2006.54 Google Scholar

38. 

T. Brox et al., “High accuracy optical flow estimation based on a theory for warping,” Lect. Notes Comput. Sci., 3024 25 –36 (2004). https://doi.org/10.1007/b97873 LNCSD9 0302-9743 Google Scholar

39. 

P. Jaccard, “The distribution of the flora of the Alpine zone,” New Phytol., 11 (2), 37 –50 (1912). https://doi.org/10.1111/nph.1912.11.issue-2 NEPHAV 0028-646X Google Scholar

Biography

Haibin Yu received his PhD degree in communication and information system from Zhejiang University in 2007. He is currently an associate professor in the College of Electronics and Information, Hangzhou Dianzi University. He was a visiting scholar at the Department of Neurosurgery, University of Pittsburgh, from November 2016 to November 2017. His research interests include computer vision, image processing, machine learning, and multisensor fusion.

Guoxiong Pan received his undergraduate degree in electronics and information engineering from Hangzhou Dianzi University in 2017. He is currently a graduate student in electronic science and technology at Hangzhou Dianzi University. His research interests include computer vision, image processing, and embedded systems.

Li Zhang received her PhD degree in communication and information system from Zhejiang University in 2008. She is currently a teacher in the School of Computer Science and Technology, Hangzhou Dianzi University. Her research interests include computer vision, image processing, and deep learning.

Zhu Li received his PhD degree in Tokyo University of Agriculture and Technology. He is currently an associate professor in the College of Electronics and Information, Hangzhou Dianzi University. His research interests include computer vision, image processing, and machine learning.

Mian Pan received his PhD degree in electronic engineering from Xidian University in 2013. He is currently a lecturer in the College of Electronics and Information, Hangzhou Dianzi University. His main research interests include statistical signal processing, adaptive signal processing, computer vision, image processing, machine learning, and their applications in industry.

CC BY: © The Authors. Published by SPIE under a Creative Commons Attribution 4.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.
Haibin Yu, Guoxiong Pan, Li Zhang, Zhu Li, and Mian Pan "Translation domain segmentation model based on improved cosine similarity for crowd motion segmentation," Journal of Electronic Imaging 28(2), 023011 (13 March 2019). https://doi.org/10.1117/1.JEI.28.2.023011
Received: 22 September 2018; Accepted: 26 February 2019; Published: 13 March 2019
Lens.org Logo
CITATIONS
Cited by 1 scholarly publication.
Advertisement
Advertisement
KEYWORDS
Image segmentation

Motion models

Laser phosphor displays

Optical flow

Particles

Image processing algorithms and systems

Detection and tracking algorithms

RELATED CONTENT


Back to Top