Personnel shortages in the military sector require deploying soldiers as effectively as possible. Increased vehicle automation, e.g. for displacements or for resupply convoys, can improve this effectiveness by lowering the mental load needed for driving. Drivers of automated vehicles resemble passengers and are thereby more susceptible to motion sickness than drivers of non-autonomous vehicles. It is useful to monitor potential motion sickness, to ensure personnel arrive fit for duty at their destination. Therefore, a system to automatically detect the presence of motion sickness would be beneficial. In this paper, we introduce a camera-based system that uses electro-optical (EO) and infrared (IR) video sets to monitor facial skin temperature and respiratory rate as a step towards camera-based motion sickness monitoring in autonomous vehicles. Our proof-of-concept system obtained sufficient measurement accuracy for use in an experimental setting in which participants were subjected to a condition that induced motion sickness. We discuss the successes and challenges encountered during system set-up and data analysis, and share insights relevant to the envisioned application in an autonomous vehicle. Specifically, we compare recordings with and without subject motion caused by the motion sickness inducing condition and discuss measurement inaccuracies that might be encountered because of IR thermal drift. Additionally, we reflect on obstacles that can arise when employing an EO/IR monitoring system in a military context.
The military is looking to adopt artificial intelligence (AI)-based computer vision for autonomous systems and decision-support. This transition requires test methods to ensure safe and effective use of such systems. Performance assessment of deep learning (DL) models, such as object detectors, typically requires extensive datasets. Simulated data offers a cost-effective alternative for generating large image datasets, without the need for access to potentially restricted operational data. However, to effectively use simulated data as a virtual proxy for real-world testing, the suitability and appropriateness of the simulation must be evaluated. This study evaluates the use of simulated data for testing DL-based object detectors, focusing on three key aspects: comparing performance on real versus simulated data, assessing the cost-effectiveness of generating simulated datasets, and evaluating the accuracy of simulations in representing reality. Using two automotive datasets, one publicly available (KITTI) and one internally developed (INDEV), we conducted experiments with both real and simulated versions. We found that although simulations can approximate real-world performance, evaluating whether a simulation accurately represents reality remains challenging. Future research should focus on developing validation approaches independent of real-world datasets to enhance the reliability of simulations in testing AI models.
We propose a real-time change detection system to be used as a vehicle-mounted early-warning system for indicators of improvised explosive devices. Within the context of military route clearance, the system automatically detects suspicious changes in the environment with respect to a previous patrol. For this purpose, historical images of the live scene are retrieved from a database and registered to the live image through 2.5-D view synthesis, using the three-dimensional (3-D) scene geometry acquired from a stereo camera. Changes are then found using local-area statistics in the CIE-Lab color space. A set of spatiotemporal filters is used to reject irrelevant alarms, resulting in a limited set of confident changes to be presented to the operator through an interactive graphical user interface. Next to the algorithmic contributions, we elaborate on the real-time design, featuring graphical processing units for the most time-consuming processing tasks, a pipelined architecture to increase the system throughput, and we split the system into a live and offline processing chain. This way, real-time change detection at 3.5 fps is achieved on images of 1920 × 1440 pixels. Finally, an extensive system validation featuring realistic experiments shows promising detection capabilities and robustness to, e.g., lateral displacements of up to 6 m.
The information available on-line and off-line, from open as well as from private sources, is growing at an exponential rate and places an increasing demand on the limited resources of Law Enforcement Agencies (LEAs). The absence of appropriate tools and techniques to collect, process, and analyze the volumes of complex and heterogeneous data has created a severe information overload. If a solution is not found, the impact on law enforcement will be dramatic, e.g. because important evidence is missed or the investigation time is too long. Furthermore, there is an uneven level of capabilities to deal with the large volumes of complex and heterogeneous data that come from multiple open and private sources at national level across the EU, which hinders cooperation and information sharing. Consequently, there is a pertinent need to develop tools, systems and processes which expedite online investigations. In this paper, we describe a suite of analysis tools to identify and localize generic concepts, instances of objects and logos in images, which constitutes a significant portion of everyday law enforcement data. We describe how incremental learning based on only a few examples and large-scale indexing are addressed in both concept detection and instance search. Our search technology allows querying of the database by visual examples and by keywords. Our tools are packaged in a Docker container to guarantee easy deployment on a system and our tools exploit possibilities provided by open source toolboxes, contributing to the technical autonomy of LEAs.
Video analytics is essential for managing large quantities of raw data that are produced by video surveillance systems (VSS) for the prevention, repression and investigation of crime and terrorism. Analytics is highly sensitive to changes in the scene, and for changes in the optical chain so a VSS with analytics needs careful configuration and prompt maintenance to avoid false alarms. However, there is a trend from static VSS consisting of fixed CCTV cameras towards more dynamic VSS deployments over public/private multi-organization networks, consisting of a wider variety of visual sensors, including pan-tilt-zoom (PTZ) cameras, body-worn cameras and cameras on moving platforms. This trend will lead to more dynamic scenes and more frequent changes in the optical chain, creating structural problems for analytics. If these problems are not adequately addressed, analytics will not be able to continue to meet end users’ developing needs. In this paper, we present a three-part solution for managing the performance of complex analytics deployments. The first part is a register containing meta data describing relevant properties of the optical chain, such as intrinsic and extrinsic calibration, and parameters of the scene such as lighting conditions or measures for scene complexity (e.g. number of people). A second part frequently assesses these parameters in the deployed VSS, stores changes in the register, and signals relevant changes in the setup to the VSS administrator. A third part uses the information in the register to dynamically configure analytics tasks based on VSS operator input. In order to support the feasibility of this solution, we give an overview of related state-of-the-art technologies for autocalibration (self-calibration), scene recognition and lighting estimation in relation to person detection. The presented solution allows for rapid and robust deployment of Video Content Analysis (VCA) tasks in large scale ad-hoc networks.
Naval ships have camera systems available to assist in performing their operational tasks. Some include automatic detection and tracking, assisting an operator by keeping a ship in view or by keeping collected information about ships. Tracking errors limit the use of camera information. When keeping a ship in view, an operator has to re-target a tracked ship if it is no longer automatically followed due to a track break, or if it is out of view. When following several ships, track errors require the operator to re-label objects.
Trackers make errors, for example, due to inaccuracies in detection, or motion that is not modeled correctly. Instead of improving this tracking using the limited information available from a single measurement, we propose a method where tracks are merged at a later stage, using information over a small interval. This merging is based on spatiotemporal matching. To limit incorrect connections, unlikely connections are identified and excluded. For this we propose two different approaches: spatiotemporal cost functions are used to exclude connections with unlikely motion and appearance cost functions are used to exclude connecting tracks of dissimilar objects. Next to this, spatiotemporal cost functions are also used to select tracks for merging. For the appearance filtering we investigated different descriptive features and developed a method for indicating similarity between tracks. This method handles variations in features due to noisy detections and changes in appearance.
We tested this method on real data with nine different targets. It is shown that track merging results in a significant reduction in number of tracks per ship. With our method we significantly reduce incorrect track merges that would occur using naïve merging functions.
In the security domain, cameras are important to assess critical situations. Apart from fixed surveillance cameras we observe an increasing number of sensors on mobile platforms, such as drones, vehicles and persons. Mobile cameras allow rapid and local deployment, enabling many novel applications and effects, such as the reduction of violence between police and citizens. However, the increased use of bodycams also creates potential challenges. For example: how can end-users extract information from the abundance of video, how can the information be presented, and how can an officer retrieve information efficiently? Nevertheless, such video gives the opportunity to stimulate the professionals’ memory, and support complete and accurate reporting. In this paper, we show how video content analysis (VCA) can address these challenges and seize these opportunities. To this end, we focus on methods for creating a complete summary of the video, which allows quick retrieval of relevant fragments. The content analysis for summarization consists of several components, such as stabilization, scene selection, motion estimation, localization, pedestrian tracking and action recognition in the video from a bodycam. The different components and visual representations of summaries are presented for retrospective investigation.
Compared to open surgery, minimal invasive surgery offers reduced trauma and faster recovery. However, lack of direct
view limits space perception. Stereo-endoscopy improves depth perception, but is still restricted to the direct endoscopic
field-of-view. We describe a novel technology that reconstructs 3D-panoramas from endoscopic video streams providing
a much wider cumulative overview. The method is compatible with any endoscope. We demonstrate that it is possible to
generate photorealistic 3D-environments from mono- and stereoscopic endoscopy. The resulting 3D-reconstructions can
be directly applied in simulators and e-learning. Extended to real-time processing, the method looks promising for
telesurgery or other remote vision-guided tasks.
Moving object detection in urban scenes is important for the guidance of autonomous vehicles, robot navigation, and
monitoring. In this paper moving objects are automatically detected using three sequential frames and tracked over a longer
period. To this extend we modify the plane+parallax, fundamental matrix, and trifocal tensor algorithms to operate on three
sequential frames automatically, and test their ability to detect moving objects in challenging urban scenes. Frame-to-frame
correspondences are established with the use of SIFT keys. The keys that are consistently matched over three frames are
used by the algorithms to distinguish between static objects and moving objects. The tracking of keys for the detected
moving objects increases their reliability over time, which is quantified by our results. To evaluate the three different
algorithms, we manually segment the moving objects in real world data and report the fraction of true positives versus
false positives. Results show that the plane+parallax method performs very well on our datasets and we prove that our
modification to this method outperforms the original method. The proposed combination of the advanced plane+parallax
method with the trifocal tensor method improves on the moving object detection and their tracking for one of the four video
sequences.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.