PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 12276 including the Title Page, Copyright information, Table of Contents, and Conference Committee Page.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Finding and extracting topic-specific information from free-text sources is an important task for classifying and distinguishing content of information systems. Such a compression process of information, in which non-relevant text parts can also be ignored, is also advantageous with regard to the further machine processing and evaluation of topic-specific documents. State-of-the-art approaches normally use well-trained modern Natural Language Processing (NLP) methods to solve such tasks. However, use cases can arise where no suitable training data sets are available to adequately prepare or fine-tune the NLP methods used. In this paper, we want to detail a model-driven approach, applying an XML data model to an application-specific scenario, combining different NLP methods into a dynamic automated NLP pipeline. The goal of this pipeline is the automatic extraction of specific information (related to certain domains or topics) from text documents allowing a structured further processing of this information. Specifically, a scenario is considered where such information has to be aligned to a given information model, defining e.g. the terms relevant for the further processing. The solution approaches described here deal with a scenario in which information clusters on a specific topic can be obtained from a given data set, even without domain-specific model training. The basis is the use of a dynamic (i.e., using different NLP methods and models) and fully automatic (i.e., using different topics at the same time) pipeline architecture combined with an XML data model. The presented approach details and extends our earlier work and gives new qualitative and first quantitative results.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
One of the most difficult challenges in counter-terrorism, crime fighting and surveillance mission is to accurately identify people from an image/video footage to catch the shortlisted terrorists and criminals. For this purpose, imaging devices used in the video surveillance systems are being improved in many aspects such as spatial resolution, frame rate, dynamic range, spectral characteristics in order to achieve better imaging performance for both monitoring and automatic detection/recognition tasks. These development efforts aim to improve the basic imaging characteristics of the device, such as the ratio of average color values (i.e red(R)/green(B) and blue(B)/green(G)) of the video footage, irrespective of high-level semantic knowledge such as presence and locations of the monitored objects/individuals in the scene by the camera. Nevertheless, a scene under multiple lighting conditions with different spectral characteristics cannot be accurately viewed with a standard imaging system in terms of color distribution. On the other hand, color is one of the most crucial factors regarding the objects/individuals identification for both operators and AI-based systems, and capturing true color features under various harsh environments is the most critical issue. In this study, a cognitive imaging system prototype is proposed to capture the true color distribution over the human faces by taking into account the locations of the detected faces with a developed smart camera. This is achieved by developing a smart Auto-White Balance (AWB) algorithm which use only the region of interest (ROI) to compute/measure the color ratios of the video frame. The current ROI is selected by intelligently and sequentially traversing among the detected faces to effectively handle whole faces in the scene. The experimental results show that the proposed ”cognitive” camera can achieve impressive improvement over the color accuracy/constancy for the detected human faces under the empirically illuminated conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Automation of production processes using robots is a priority for the development of many industrial enterprises. Robotization is aimed at freeing a person from dangerous or routine work. At the same time, robots are able to perform tasks more efficiently than human, and the collaboration of a human and a robot allows to combine the strengths and effectiveness of robots and human cognitive ability into a single flexible system, and as a result, organize flexible methods of automation and reconfiguration of production processes. In this work, we focused on the implementation of the method of interaction between a person and a robot based on the recognition of gesture commands of a human-operator. An approach based on extraction a human skeleton and classification using a neural network is proposed as a method for recognizing actions. To test the effectiveness of the proposed algorithm, the possibility of transmitting gesture commands to the robot and organizing a contactless control method of the robot, simulation modeling was carried out in the RoboGuid environment. This environment is for industrial robots, provided by Fanuc.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Data annotation is a time-consuming, labor-intensive step in supervised learning, mainly for detection and classification. Most of the time, human effort for annotation is required to obtain an accurately labeled dataset, which is time-consuming and sometimes impossible, especially for large datasets. Most of the novel methods use various networks to annotate the data. However, numerous hand-labeled data are still required for those methods. In order to solve this problem, we propose a method to make the process as human-independent as possible while preserving the annotation performance. The proposed method is applicable to datasets, for which the majority of the frames/images contain a single object (or a known number, ”n”, of objects). The method starts with an initial annotation network that is trained with a small amount of labeled data, %10 of the total training set, and then it continues iteratively. We use the annotation network to select the subset of the training set that is to be hand-labeled for the next iteration. This way, examples that are more likely to improve the annotation network can be selected. The total number of necessary hand-labeled images is dependent on the specific problem. We observed that when the proposed approach was used rather than annotating all the images, manually annotating approximately %25 of the dataset was sufficient. This percentage can vary according to the complexity and the type of the annotation network, as well as the dataset content. Our method can be used with existing (semi) automatic annotation tools.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The field of Reinforcement Learning continues to show promise in solving old problems in new innovative ways. Thanks to the algorithms’ ability to learn without an explicit set of labeled training data, the action, environment, reward approach has lured many researches into framing old problems in this manner. Recent publications have demonstrated how utilizing a multi-agent reinforcement learning approach can lead to a superior policy for optimization algorithm over the current standards. The challenge with the aforementioned approaches is the inclusion of the gradient in the state-space. This forces a costly calculation that is often the bottle neck in most machine learning problems, often limiting or preventing training at the edge or on the front lines. While previous works dating back decades have demonstrated the ability to train simple machine learning models without the use of gradients, none have done so using a policy which leverages previous experiences to solve the problem more quickly. This work will show how a Multi-Agent Reinforcement Learning approach can be used to optimize models in training without the need for the gradient of the loss function, effectively eliminating the need for backpropagation and significantly reducing the computational power required to train a model. Furthermore, the work will examine conditions under which the agents failed to find an optimal solution. As well as how this approach can be beneficial in complex defense applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This conference presentation was prepared for the Artificial Intelligence and Machine Learning in Defense Applications IV conference at SPIE Security + Defence, 2022.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Applied research presented in this paper describes an approach to provide meaningful evaluation of the Machine Learning (ML) components in a Full Motion Video (FMV) Machine Learning Enabled System (MLES). The MLES itself is not discussed in the paper. We focus on the experimental activity that has been designed to provide confidence that the MLES, when fielded under dynamic and uncertain conditions, performance will not be undermined by a lack of ML robustness. For example, to real-world changes of the same scene under differing light conditions. The paper details the technical approach and how it is applied to data, across the overall experimental pipeline, consisting of a perturbation engine, test pipeline and metric production. Data is from a small imagery dataset and the results are shown and discussed as part of a proof of concept study.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Maritime object detection in synthetic aperture radar (SAR) imagery has seen a resurgence of interest due to the introduction of deep object detection models and large-scale datasets from competitions such as xView3. However, as novel examples are seen in the wild and as new SAR sensors emerge, existing models will need to be retrained with relevant new data. Active learning (AL) aims to automate and optimize this data curation process. In this work, we evaluate state-of-the-art AL algorithms for the task of SAR maritime object detection. Through analyzing and identifying gaps in current AL solutions we seek to motivate efforts to improve their utility in practical settings.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Performance metrics that are used by the academia to evaluate the performance of video object detection algorithms are usually not informative for assessing whether or not the algorithms of interest are suitable (mature enough) for real-world deployment. We propose an approach to define the performance metrics that are suitable for various operational scenarios. In particular, we define four operational modes: Surveillance (alarm), situational awareness, detection and tracking. We then describe the performance metrics for the needs of each operational scenario, and explain the underlying reasoning. The metrics are compatible with the common practices for constructing in-house video datasets. We believe that these metrics provide useful insight for the usability of the algorithms. We also demonstrate that an algorithm which (at first glance) seem to have insufficient performance for deployment can be used in a real-world system (with simple post-processing) if its parameters are configured to provide high performance scores for scenario-specific metrics. We also show that the same underlying algorithm can be used for different operational scenarios if its parameters (and/or post-processing steps) are adjusted to meet the criteria based on relevant scenario-based performance metrics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this study, the performance of online and offline tracking algorithms, which are frequently used in the literature, were compared on the defined datasets. Therefore, six different datasets are prepared. The datasets consist of consecutive frames. In each dataset, target has different motion characteristics and the background types are different from each other. In addition, a total of six well known algorithms were used for comparison. These are KCF, MOSSE, CSRT, TLD, Go Turn and Siamese tracking algorithms. In conclusion, especially in cases where the target is small and the SNR value is low, the highest performance is obtained with the KCF algorithm. On the other hand, when the target is big and the SNR value is high, it is observed that the Siamese algorithm better handle changes in the target shape. In this context, considering the real scenarios, it may be possible to use the algorithms in a hybrid way to get the better performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study accompanies the initial public release of the software for ARORA, or A Realistic Open environment for Rapid Agent training, and marks a high point of several years of work for the mature and completely open ARORA simulator. The purpose of ARORA is to support the training of an autonomous agent for tasks associated with a large-scale and geospecific outdoor urban environment, including the task of navigation as a car. The study elaborates on the simulator's architecture, agent, and environment. For the environment, ARORA provides an improvement on similar simulators through an unconstrained geospecific environment with detailed semantic annotation. The agent is represented as a car available with four different options of physics fidelity. The agent also has sensors available: a pose sensor, a camera sensor, and a set of three proximity sensors. Future use cases from training extend to both civilians and militaries (including human training and wargaming), in terms of training autonomous agents in outdoor urban environments. The study also presents a brief description of NavSim, a Python-based companion tool. The purpose of NavSim is to connect to ARORA (or any other similar simulator) and train an agent using reinforcement-learning algorithms. The study also provides challenges in development and subsequent work-arounds and solutions. The goal of the ARORA & NavSim system is to provide communities with a high-fidelity, publicly available, free, and open-source system for training an autonomous agent as a car.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Any program tasked with the evaluation and acquisition of algorithms for use in deployed scenarios must have an impartial, repeatable, and auditable means of benchmarking both candidate and fielded algorithms. Success in this endeavor requires a body of representative sensor data, data labels indicating the proper algorithmic response to the data as adjudicated by subject matter experts, a means of executing algorithms under review against the data, and the ability to automatically score and report algorithm performance. Each of these capabilities should be constructed in support of program and mission goals. By curating and maintaining data, labels, tests, and scoring methodology, a program can understand and continually improve the relationship between benchmarked and fielded performance of acquired algorithms. A system supporting these program needs, deployed in an environment with sufficient computational power and necessary security controls is a powerful tool for ensuring due diligence in evaluation and acquisition of mission critical algorithms. This paper describes the Seascape system and its place in such a process.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Identifying an object of interest in thermal images plays a vital role in several military and civilian applications. The deep learning approach has shown its superiority in object detection in various RGB datasets. However, regarding to thermal images, their low resolution and shortage of detail properties impose a huge challenge that hinders the accuracy. In this paper, we propose an improved version of YOLOv5 model to tackle this problem. Convolution Block Attention Module (CBAM) is integrated into traditional YOLOv5 for better representation of objects by focusing on important features and neglecting unnecessary ones. The Selective Kernel Network(SENet) is added to maximize the shallow features usage. Furthermore, the multiscale detection mechanism is utilized to improve small object detection accuracy. We train our model on the mixed visible-thermal images collected from LSOTB-TIR, LLVIP, and COCO datasets. We evaluate the performance of our method on 8 classes of objects: person, bicycle, airplane, helicopter, car, motorbike, boat, and tank. Experiment results show that our approach can achieve mAP up to 90.2%, which outperforms the original YOLOv5 and other popular methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Feature extraction techniques play an essential role in classifying and recognizing targets in synthetic aperture radar (SAR) images. This article proposes a hybrid feature extraction technique based on convolutional neural networks and principal component analysis. The proposed method is used to extract features of oil rigs and ships in C-band synthetic aperture radar polarimetric images obtained with the Sentinel-1 satellite system. The extracted features are used as input in the logistic regression (LR), support vector machine (SVM), random forest (RF), naive Bayes (NB), decision tree (DT), and k-nearest-neighbors (kNN) classification algorithms. Furthermore, the statistical tests of Kruskal-Wallis and Dunn were considered to show that the proposed extraction algorithm has a significant impact on the performance of the classifiers.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Small unmanned aerial vehicles (UAVs) are becoming more and more popular and also a challenge for civilian and military security. A UAV has to be detected first, but due to environmental condition (e.g. night or fog) the detection is impeded. To assess the threat of a possible hostile UAV, identification is helpful. If the type of UAV can be determined, information about size, payload, velocity and range can be given and countermeasures can be considered. Identification of UAVs can be more accurate using multiple spectral ranges at the same time. We present a systematic approach for acquisition of multispectral signatures in the field and in the lab, structured storage in a database and composition of partially synthetic images as training data for identification in an artificial neural network. We set up a multispectral camera system comprised of three imagers, in the visible spectrum, SWIR and MWIR. The cameras are externally triggered. This allows an image acquisition in the field with a synchronized video stream. In addition to that, high resolution images are made in the lab from different angles all around the micro UAV. A specific background is chosen, so it will be masked and with a given real world background image a partially synthetic image can be generated. These can be validated with data that was gathered in the field. Both are stored in a database, along with metadata, to allow access to particular data when needed. Synthetic images and signatures from the field can be used as multispectral training data for an artificial neural network to enable identification of a UAV.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Cooperative autonomous systems, such as swarms, multi-camera systems, or fleets of self-driving vehicles, can better understand a given scene by sharing multiple modalities and varying viewpoints. This superposition of data adds robustness and redundancy to a system typically burdened with obstructions and unrecognizable, distant objects. Collaborative perception is a key component of cooperative autonomous systems where modalities can include camera sensors, LiDAR, RADAR, and depth images. Meanwhile, the amount of useful information that can be shared between agents in a cooperative system is constrained by current communication technologies (e.g. bandwidth limitations). Recent developments in learned compression can enable the training of end-to-end cooperative systems using deep learning with compressed communication in the pipeline. We explore the use of a deep learning object detector in a cooperative setting with a learned compression model facilitating communication between agents. To test our algorithm, this research will focus on object detection in the image domain as a proxy for one of the modalities used by collaborative systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The article proposes an approach to improve the accuracy of restoring the boundaries of objects obtained to create 3D structures by analyzing data obtained by a machine vision system. At the first stage, the operation of reducing the number of color gradients is performed, the technique allows you to combine similar values into common enlarged structures. This operation allows you to simplify the analyzed objects, since small details are not important. In parallel with the first operation of denoising is performed. The paper proposes the application of the multicriteria processing method with the possibility of smoothing locally stationary sections and preserving the boundaries of objects. As an algorithm for strengthening the boundaries of objects, a modification of the combined multi-criteria method is used, which makes it possible to reduce the effect of salt/pepper noise and impulse failures, as well as to strengthen the detected boundaries of objects. The resulting images with enhanced boundaries are fed to the input of the block for constructing three-dimensional objects. The data obtained by both a stereo pair and a camera based on 3D construction using structured light were used in the work. On a set of synthetic data simulating the work in real conditions, the increase in the efficiency of the system using the proposed approach is shown. Based on field data under conditions of interfering factors in the form of dust/fog, the applicability of the proposed approach for solving problems of increasing the accuracy of restoring the boundaries of objects obtained to create three-dimensional structures is shown. Images of simple shapes are used as analyzed objects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents a new method for video segmentation using deep learning neural networks in the quaternion space into sets of objects, background, static and dynamic textures. We introduce a novel quaternionic anisotropic gradient (QAG) which can combine the color channels and the orientations in the image plane. The local polynomial estimates and the ICI rule are used for QAG calculation. Since for segmentation tasks, the image is usually converted to grayscale, this leads to the loss of important information about color, saturation, and other important information associated color. To solve this problem, we use the quaternion framework to represent a color image to consider all three channels simultaneously when segmenting the RGB image. Using the QAGs, we extract the local orientation information in the color images. Second, to improve the segmentation result we applied neural network to this derived orientation information. The presented new approach allows obtaining clearer and more detailed boundaries of objects of interest. Experimental comparisons to state-of-the-art video segmentation methods demonstrate the effectiveness of the proposed approach.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
People in the deaf-mute community benefit a lot from Chinese sign language (CSL) recognition, which can promote communication between sign language users and non-users. Recently, some studies have been made on sign language recognition with the millimeter-wave radar because of its advantages of non-contact measurements and privacy controls. The millimeter-wave radar acquires the motion characteristics based on the micro-Doppler images, which can be used for CSL recognition. Existing recognition methods measure the micro-Doppler image in a certain direction, which cannot reflect all the motion information of CSL and leads to the failure of recognition of the CSL with similar actions. In order to improve the recognition accuracy, this paper proposes a multi-view deep neural network (MV-DNN), which fuses micro-Doppler features measured in different directions. The simulation results show that the recognition accuracy of the proposed method reaches 96% for eight CSLs, which is 8% higher than that of the traditional single-view method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.