PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 11550 including the Title Page, Copyright information, and Table of Contents.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Training an artificial neural network with backpropagation algorithms requires an extensive computational process. Our recent work proposes to implement the backpropagation algorithm optically for in-situ training of both the linear and nonlinear diffractive optical neural networks which enables the acceleration of training speed and improvement on the energy efficiency on core computing modules. We numerically validated that the proposed in-situ optical learning architecture achieves comparable accuracy to the in-silico training with an electronic computer on the task of object classification and matrix-vector multiplication, which further allows adaptation to the system imperfections. Besides, the self-adaptive property of our approach facilitates the novel application of the network for all-optical imaging through scattering media. The proposed approach paves the way for the robust implementation of large-scale diffractive neural networks to perform distinctive tasks all-optically.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The conventional high-level sensing techniques require high-fidelity images to extract visual features, which consume high software complexity or high hardware complexity. We present the single-pixel sensing (SPS) technique that performs high-level sensing directly from a small amount of coupled single-pixel measurements, without the conventional image acquisition and reconstruction process. The technique consists of three steps, including binarized light modulation, single-pixel coupled detection, and end-to-end deep-learning based decoding. The binarized modulation patterns are optimized with the decoding network by a two-step training strategy, leading to the least required measurements and optimal sensing accuracy. The effectiveness of SPS is experimentally demonstrated on the classification task of handwritten MNIST dataset, and 96% classification accuracy at ∼1kHz is achieved. The reported SPS technique is a novel framework for efficient machine intelligence, with low hardware and software complexity. Further, it maintains strong encryption.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fourier ptychographic microscopy (FPM) is a recently developed computational imaging technology, which achieves high-resolution imaging with a wide filed-of-view by overcoming the limitation of the optical spatial-bandwidth-product (SBP). In the traditional FPM system, the aberration of the optical system is ignored, which may significantly degrade the reconstruction results. In this paper, we propose a novel FPM reconstruction method based on the forward neural network models with aberration correction, termed FNN-AC. Zernike polynomials are used to indicate the wavefront aberration in our method.Both the spectrum of the sample and coefficients of different Zernike modes are treated as the learnable weights in the trainable layers.By minimizing the loss function in the training process, the coefficients of different Zernike modes can be trained, which can be used to correct the aberration of the optical system. Simulation has been performed to verify the effectiveness of the FNN-AC.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In multispectral photoacoustic imaging (PAI), the illumination spectrum inside biological tissue varies spatially, leading to poor quantification accuracy of blood oxygen saturation (SO2). The key to solving this problem is to invert light diffusion, which is extremely complicated and inaccurate due to the limited information available in PAI. Despite the great effort devoted, to date, the few available methods are all limited in terms of in vivo performance and physical insights. Here, we introduce an analytical Monte Carlo method, with which we prove that the light spectrum in biological tissue mathematically lies in a high dimensional convex cone set. The model offers new insights into the origin of the spectral deterioration, and we find it possible to calculate blood oxygen saturation (SO2) accurately by using only the photoacoustic data at a single spatial location when signal to noise ratio is sufficient. The method was demonstrated numerically, and our preliminary phantom experiment results also confirmed its effectiveness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Speckle autocorrelation based on optical memory effect is an interesting and important method to realize scattering imaging. However, the effective detection range is limited by the radiation phenomenon of the speckle field when there is a wide spectral illumination. In this paper, by utilizing the response of point spread function (PSF) to image distance (the distance between the detector plane and the scattering medium), we propose a method to improve the imaging quality under wide spectral illumination. PSF is sensitive to the image distance, as the distance between the detection plane and reference plane increases, the correlation coefficient between their PSFs will decreases. Superposing the autocorrelations of speckle patterns under different image distances can suppress the statistical noise, and thus improve the reconstruction quality. This method reduces the dependence on light source power and effective detection range, having certain prospect in seeing through natural turbid media.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Lunar has stable spectral properties with a varying tendency of 10-8 per year. Due to the independent on the scattering and absorbing effects of atmosphere, lunar is a perfect radiometric reference for earth-observing satellite. Researching on the radiometric model of lunar gives a new path to on-orbit radiometric calibration for remote sensing satellite to get rid of the uncertainties of atmosphere on calibration accuracy. GF-4 of China was a geostationary remote sensing satellite with a big array CMOS detector, it can acquire the whole lunar disk image without earth-atmosphere stray light by rolling the satellite, and the instantaneous field of view (IFOV) of VNIR sensor to lunar is about 500m. Firstly, recalibration is implemented before further processing. Secondly, the radiometric properties of lunar are retrieved by the recalibration results. A common used method in researching the radiometry of lunar was adopted in this work: deriving the bidirectional reflectance factor (BRF) on three typical lunar calibration sites of Apollo-16, MS-2 and CE-3 with t GF-4 lunar image, and comparing the BRFs with SP on SELENE, M3 on Chandrayaan-1 and IIM on CE-1. The model of GF-4 VNIR sensor showed a relatively high consistency with other existed models as wavelength increasing. The results indicated that the radiometric model of lunar in this work had the potentiality to join the group of lunar models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a new method of rendering panoramic light field within a certain range, which provides an effective way for the acquisition of virtual reality data. Different from existing panoramic light field reconstruction algorithms, we propose the concept of the ray sphere to render a panoramic light field. Using the internal and external parameters of the cameras in the acquisition system, all the light field data are uniformly expressed in the ray sphere. The constructed ray sphere enables the rendering of a panoramic light field of any view point within a 3DOF+ space, which can not be achieved with correlation methods. In addition, we design and build an acquisition system to capture real scenes to verify the effectiveness of our method. Experimental results show that our method can get panoramic light field rendering of any view point on the horizontal plane within half the radius of the acquisition system, and can effectively process the light field video data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To solve the problems of traditional integral imaging, such as poor visual quality, the ray-crosstalk in display, and grainy obvious, a display method based on discrete glued lens array and holographic diffuser is proposed. The proposed method replaces the continuous single lens array in the traditional display method of integral imaging with a discrete glued lens array and a holographic diffuser. In this paper, the structure and imaging quality of single lens and glued lens are designed, analyzed and compared. And the diffusion effect of the holographic diffuser is theoretically analyzed. We have designed two display system based on the ultra-high-density small-pitch LED display plane. The experimental results show that the visual quality of the proposed method is significantly improved compared with the traditional integral imaging display method. Compared with the traditional integral imaging display method based on a continuous single lens array, the proposed method can effectively reduce the influence of ray-crosstalk on the 3D images, smooth the discontinuous light field distribution, and reduce graininess to improve visual quality. In addition, since the traditional continuous single lens array needs to manufacture an abrasive tool with the same size as the display platform during processing, and the discrete glued lens array only needs to process the unit lens and then assemble them. If a large-scale integral imaging display system is to be manufactured, discrete lens array is more suitable, because the discrete glued lens array is cheaper and easier to be manufactured. In order to meet the experimental expectations and improve the display visual quality, we used a glued lens array in the experiment. Compared with aspheric lens that is difficult to process, the glued lens is easier to design and process and has good display visual quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Imaging through the dynamic scattering media is a challenging problem in various situations like imaging under dense fog or turbid water. Here, we use fat emulsion suspensions as optical phantoms to mimic the turbid media and propose a single-shot end-to-end learning based method to directly retrieve the objects from the corresponding scattering images. We present the measurement of the dynamic characteristics of Intralipid dilutions, including optical thickness and decorrelation time. And a glass jar with a length 33.6cm is used in our incoherent imaging system, where the background noise is also existed. Experimental results show that our approach can reconstruct the object almost perfectly under the strong background light circumstance, where the signal-noise ratio is lower than -17 dB and the optical depth is close to 16.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the development of machine vision technology, in the process of visual navigation with images, it is necessary to match the local geometric features or global features of the images; however, the matching of local geometric features is low in accuracy and difficult to be used in tracking. In contrast, template-based global feature matching can directly use the information of the entire image, and it has high robustness to illumination variations and occlusions, so it has attracted widespread attention. At present, the classical matching algorithms based on templates mainly include Sum of Absolute Differences (SAD), Sum of Squared Differences (SSD), Normalized Cross Correlation (NCC), and Mutual Information (MI). In order to make it more reasonable to evaluate and compare the performance of the algorithms, in this paper, we decided to compare Mean Absolute Differences (MAD), Mean Square Differences (MSD), Zero-mean Normalized Cross Correlation (ZNCC), and Normalized Mutual Information (NMI). During the experiment, the Gaussian noise, illumination variations and occlusion were applied to the current image to simulate complex navigation scenes, and then matched it with the template images. The matching values obtained by the above four matching algorithms in different scenes were collectively called as alignment metric values. The matching effects of the four algorithms were evaluated from the following aspects including the smoothness of the metric value, the number of local extremums and whether the best position was in the correct alignment position. The results showed that the accuracy of MSD was greatly affected by noise and was not suitable for scenes interfered by noise, the number of local extremums of ZNCC changed greatly under the conditions of noise, illumination changes, and occlusion, the alignment metric values became unsmooth. In comparison, the NMI showed good robustness and accuracy in different conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Polarized skylight sensor can calculate the heading angle by detecting the polarization patterns of skylight and overcome many inherent defects of the conventional navigation methods. This paper develops a real-time bionic polarized skylight sensor. In order to eliminate the sensor’s hardware errors, an indoor calibration experiment is conducted. We also propose an image processing method to enhance the sensor’s robustness in the urban environment. The comparative experiment shows that both calibration experiment and image processing algorithm can achieve good effects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the scope of the current research, authors have developed an approach to accelerate the synthesis of realistic images with bidirectional ray tracing based on the consecutive usage of the photon maps and backward photon (imphoton) maps. In the presented approach the photon maps are used to account for the possible sources of caustic illumination, while the imphoton maps are used to account the indirect illumination. The caustic photon maps are built for points of the first diffuse event after the occurrence of at least one specular event at the path of the forward ray. As the caustic illumination is accounted in the photon maps it became possible to shift the imphoton maps farther from the first diffuse event of the backward ray and build the imphoton maps for the second diffuse event. The algorithm was optimized to be used on a modern multiprocessor workstation with the non-uniform memory access pattern. By using the three-level threads hierarchy that combines synchronous, semi-synchronous, and asynchronous components authors have achieved an additional image synthesis speedup. The ray tracing methods were based on physically correct laws of light propagation, which allows performing physically correct modeling and image synthesis for optically complex scenes containing optical devices. The presented image synthesis methods were implemented and integrated into the computer system of photorealistic rendering. The sample rendering results are presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Objective quality assessment plays a vital role in the evaluation and optimization of panoramic video. However, most of the current methods only consider the structural distortion caused by the projection format, and do not consider the effect of clarity on quality evaluation. For this reason, we propose a new objective video quality assessment method for panoramic video. First, the source image and the distorted image are down-sampled to obtain five sets of images with different scales. Second, calculate WS-SSIM at different scales. Finally, according to the degree of influence of different scales on the subjective evaluation, different coefficients are assigned to the corresponding WS-SSIM, and the overall score is calculated. Experiments on the database established in our laboratory have proved its effectiveness through comparison.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
During magnetic resonance imaging (MRI), the strong response to the signal is usually displayed as structural edges and textures, which is important for distinguishing different tissues and lesions. In the current superresolution (SR) methods with the usage of deep learning, some low-level structural information tends to gradually disappear as the network deepens, resulting in excessive smoothness in high-frequency regions. This phenomenon is particularly noticeable in MRI with poor brightness contrast and small gray dynamic range. Although the generative adversarial network (GAN) can repair structured textures well in natural images, it is likely to learn patterns that do not exist in the images, which poses risks to the reconstruction of medical images. Therefore, we propose an enhanced gradient guiding network (EG2N) to alleviate these problems. On the one hand, for improving the contrast and suppress the noise effectively, we use a multi-scale wavelet enhancement for preprocessing, where the enhanced gradient map is considered as the structural prior. On the other hand, blindly using dense connections in the feed-forward network will bring about redundancy, so structural features from an additional branch are added to specific layers as a supplement to high-level features and constrain optimization. We add a feedback mechanism to promote cross-layer flow between low-level and high-level features. In addition, the perceptual loss is added to avoid distortion caused by excessive smoothing. The experimental results show that our method achieves the best visual results and excellent performance compared with state-of-the-art methods on most popular MR images SR benchmarks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Attribute information in fine-grained image recognition often provides more accurate and rich information related to categories. How to effectively combine such knowledge to guide image classification tasks has been one of the research hotspots in computer vision in recent years. We believe that using the association relationship between attributes to fuse attribute information can obtain a more accurate representation of the image. In this paper, we propose a novel Multi-Task Attribute Fusion Model (MTAF) which makes two major improvements to the traditional multi-task learning framework: 1) Attribute-Aware Feature Discrimination: combine the spatial attention and the channel attention mechanism to enhance the feature map of the CNN, so that attribute can be associated to important positions and important channels of the image; 2) Transformer-Based Feature Fusion: introduce the Transformer model to better learn the logical association between attributes, so that the reconstructed features are able to achieve a best classification performance. We have verified our algorithm on two datasets, one is the own-collected medical dataset for thyroid benign and malignant identification, and the other is an open dataset widely used for fine-grained image recognition. Experimental results on both datasets demonstrate that the proposed method can achieve higher classification accuracy than baselines.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Satellite imaging reconnaissance and aerial video surveillance may be the most important means for monitoring objects on the ground. First, we discuss the requirements on reconnaissance information for different missions including strategic operations, tactical tasks and fire control. In terms of precision and timeliness, we analyze the characteristics of reconnaissance images or videos captured by different platforms such as satellites and aerial vehicles. Then, we propose a general framework for tactical exploitation of multi-source images, which can provide a Common Operational Picture (COP) which is both real-time and precise enough for tactical purposes or even for fire control. According to this framework, we detect, identify, and track objects, and obtain higher precision by integrating the aerial videos and the satellite images. To show how to integrate the aerial videos and the satellite images, we study image registration techniques, analyze the differences between these two kinds of images, and present a feature-based geo-registration procedure. By spatially aligning these images, we can obtain the geographical coordinates of high precision in real time. The framework can be adopted for striking time-sensitive ground targets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The current work is dedicated to solving one of the problems for virtual reality systems - restoring the coordinates of light sources for rendering realistic images of the resulting scene. In this paper, we propose approaches based on convolutional neural networks, computer vision algorithms, algorithms based on the intersection of rays from the shadow of images and disparity maps to find the exact location of the light source and the power of its illumination. For easy use of the algorithms, a GUI application was developed that allows to select the necessary operating modes and evaluate the speed of their work.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Magnetic resonance imaging (MRI) is a common medical imaging technology in modern medicine. MRI images can provide valuable information for doctors to diagnose and treat patients, in which the segmentation of the brain into gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF) is an important step for neural image analysis. Clustering methods are widely developed for this task. However, the traditional clustering segmentation method is easy to be affected by the initial clustering centers, which cause great trouble to accurately identify and extract the tissue. In this paper, we propose a bilateral-driven multi-centers clustering (BMC) method to segment brain MRI images, which integrates the pixel feature and spatial relationship. Firstly, the traditional fuzzy c-means (FCM) is employed to perform an initial rough segmentation. Secondly, we propose multi-centers seeking strategy based on the algorithm of clustering by fast search and find of density peaks (CFDP) to get primary multi-centers for each cluster. Thirdly, an iterative procedure is proposed to seek secondary centers and link all potential centers for each cluster, and in this process, the cluster labels of some pixels in the neighborhood of the secondary centers are determined accordingly. The proposed method is validated on the public simulated brain data from BrainWeb, experimental results show the proposed method can achieve a better segmentation performance than traditional methods. It is shown that our main strategy by multicenters can reasonably reflect the tissue distribution, which is advantageous than the traditional clustering segmentation method with an only single center.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Complete 3D information of the object is required in many fields. However, single-view observation always leads to the loss of 3D information. We introduce a learning based approach to simultaneously estimate the pose and shape of a given object from a single depth map.To address the problem, a depth map is firstly converted to be the corresponding partial point cloud, then an autoencoder-based network is proposed to learn this pose estimation as well as shape completion process. In the learning paradigm ,we utilize a novel pose representation, structured point list (SPL) to describe objects pose, which enables the network to understand the pose of the input object relative to the perspective. Compared with directly shape reconstruction, we find that adding SPL estimation as an intermediate supervision can both improve the accuracy of reconstruction and accelerate the convergence speed for training. Our method achieved SOTA results on both rigid and non-rigid objects reconstructions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the current popular visual tasks, one single model is usually used to output the final results. But no model is perfect, in this paper, we propose a simple and general multi-model method. To combine the advantages of multiple models, we design a familiarity prediction network to output the model's familiarity of images, then select the optimal model based on the familiarity value. Since the loss value is a single value, the familiarity value of any task can be reflected in the loss value, so the output of the familiarity prediction network can be regarded as an estimate of the loss value. The accuracy of the multi-model exceeds any single model that composes the multi-model. By limiting the number of feature layers input into the familiarity network, the sacrifice of computation and detection speed is acceptable. Our method is general and task-agnostic, it not only performs well on classification tasks but also on object detection tasks and other vision tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Multi-view cosegmentation for the same object is the basis of true three-dimensional imaging. Due to changes in the foreground and interferences from the background of the images, traditional cosegmentation algorithms often cannot fully and effectively extract common areas. To solve this problem, in this paper,we propose a new image cosegmentation algorithm which incorporates the minimum fuzzy divergence and active contours model.Considering the foreground similarity and background consistency between multiple images,the energy functions of images are generated. We lead color information covered by an image into the energy function of another image to enhance the robustness of curve evolution.Then we minimize the energy function value via the minimum fuzzy divergence. The experimental demonstrate that the proposed method can effectively segment the common objects from multi-view image pairs with generating lower error rates than that of traditional cosegmentation methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In cities, a large amount of municipal solid waste has impacted on the ecological environment significantly. Automatic and robust waste detection and classification is a promising and challenging problem in urban solid waste disposal. The performance of the classical detection and classification method is degraded by some factors, such as various occlusion and scale differences. To enhance the detection model robustness to occlusion and small items, we proposed a robust waste detection method based on a cascade adversarial spatial dropout detection network(Cascade ASDDN). The hard examples with occlusion in pyramid feature space are generated and used to adversarial training a detection network. Hard samples are generated by the spatial dropout module with Gradient-weighted Class Activation Mapping. The experiment verifies the effectiveness of our method on the 2020 Haihua AI challenge waste classification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The super-resolution integrated imaging based on sparse camera array and convolution neural network can reduce the rendering time by reducing the number of cameras, and then reconstruct the low-resolution element image into highresolution element image by using convolutional neural network. In order to further improve the effect of element image reconstruction, this paper improves the network model optimizer and sensitive parameters, constructs activation function and loss function, and uses smaller convolution kernel in the last layer of convolution neural network to improve the quality of the generated element image. At last, the original scheme and the improved scheme are verified and compared through the TensorFlow platform. The experimental results show that the reconstruction element image generated by the improved scheme is better and the network training time is shorter.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Some researches show that learner’s Emotional state has an important impact on affective and cognitive processes influencing learning. A positive emotional state can enhance learning outcome. It is important to detect learner’s emotion state in learning processes unconsciously. Generally, emotions can be classified within the two dimensions, valence and activation. Happiness is an activating positive valence emotional state. This paper presents a happiness emotion detection method based on deep learning. Firstly, a certain amount of face images which include static emotion are selected from the image database. Faces are detected by using a face detector and aligned by using eye locations. Then, the face images are clipped into proper size to match the convolutional neural network input. In our classifier, input layer accepts single channel to process grayscale images, and the output layer outputs two classes, i.e. happiness emotion and non-happiness. Fourfold cross-validation is performed on the facial expression image dataset which is divided into four subsets randomly. In every round of cross validation, one subset is used for testing and other three subsets are used for training. The experiment results show that the average accuracy is up to 98.78 percent which is enough to use in learning outcome evaluation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The key to improve the capability of photoelectric detection and countermeasure system is to improve the detection ability of the system. System resolution, detection distance, detection range, response time, system signal-to-noise ratio are important criteria, and system optics is the core of its engineering implementation. From the global perspective, the adaptive algorithm establishes a new theory which superior to the traditional method, and solves the problem of freely determining the core optical solution, and opens up a new field for the development of accurate detection system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Light field (LF) imaging, which can capture spatial and angular information of light-rays in one shot, has received increasing attention. However, the well-known LF spatio-angular trade-off problem has restricted many applications of LF imaging. In order to alleviate this problem, this paper put forward a dual-level LF reconstruction network to improve LF angular resolution with sparselysampled LF inputs. Instead of using 2D or 3D LF representation in reconstruction process, this paper propose an LF directional EPI volume representation to synthesize the full LF. The proposed LF representation can encourage an interaction of spatial-angular dimensions in convolutional operation, which is benefit for recovering the lost texture details in synthesized sub-aperture images (SAIs). In order to extract the high-dimensional geometric features of the angular mapping from low angular resolution inputs to high angular full LF, a dual-level deep network is introduced. The proposed deep network consists of an SAI synthesis sub-network and a detail refinement sub-network, which allows LF reconstruction in a dual-level constraint (i.e., from coarse to fine). Our network model is evaluated on several real-world LF scenes datasets, and extensive experiments validate that the proposed model outperforms the state-of-the-arts and achieves a better reconstruct SAIs perceptual quality as well.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The mountainous woodland covering with dense vegetation has complex terrain deformation, poor surface stability and rare obvious markers, which brings challenges to the accurate registration of multi-temporal remote sensing images. Viewing multi-temporal satellite image sequences as a whole matrix, we conduct RPCA matrix decomposition to generate a low-rank matrix and a sparse matrix, where the column of low rank matrix can be considered as the stable surface image. Referring to this, the original image registration is operated. It solves the difficulty to distinguish the real change of scenery and the distortion of remote sensing image in the case of unstable features and lack of obvious markers. Based on the feature matching method and local coordinate transformation and resampling model, the multi-temporal images are respectively registered with their stable surface images, and finally realize the batch accurate registration of multi-temporal satellite images of mountain forestland.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A LED full parallax integrated imaging display system is presented, which consists of a led display, a lens array, a shielding plate, and a diffusion screen. In order to solve the problem of light overlapping and cross-talk between adjacent lenses in traditional integrated imaging display system,through theoretical analysis and simulation experiments, we design and make a shielding plate based on theoretical analysis. The holes of the shielding plate are arranged in odd and even patterns. Using the diffusion screen to recover the occluded light field information to achieve a continuous and clear stereoscopic display effect. The experimental results show that the display method proposed in this paper is superior to the traditional integrated imaging display method, and the viewer can obtain evident stereoscopic sense and a clearer image at the optimal distance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In electronic engineering and control theory, a step response plays a fundamental role to evaluate the response of a system to its inputs change from zero to one in a very short time. From a practical standpoint, knowing how the system responds to a sudden input is important because large and fast deviations from the long term steady state may have extreme effects on the component itself and on other portions of the overall system dependent on this component. In addition, knowing the step response of a dynamical system gives information on the stability of such a system, and on its ability to reach one stationary state when starting from another.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Color point clouds can provide users more realistic visual information and better immersive experience than traditional imaging techniques. How to evaluate the visual quality of color point clouds accurately is an important issue to be solved urgently. In this work, we propose a novel full reference metric, called as Visual Quality Assessment of Color Point Clouds (VQA-CPC). Starting from the geometry and texture of color point cloud, the proposed metric calculates the distances from color point cloud’s points to their geometric centroid and the distances from the texture coordinates of the points to texture centroid. Then, a measuring distortion strategy based on distortion measurement is designed and used to extract the features of color point cloud. Finally, the extracted geometric features and texture features are used to construct the feature vector and predict quality of the distorted color point cloud. Moreover, we construct a color point cloud database, called as NBU-PCD1.0, for verifying the effectiveness of the proposed metric. Experimental results show that the proposed VQA-CPC metric is better than the existing point cloud metrics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Low light object detection is a challenging problem in the field of computer vision and multimedia. Most available object detection methods are not accurate enough in low light conditions. The main idea of low light object detection is to add an image enhancement preprocessing module before the detection network. However, the traditional image enhancement algorithms may cause color loss, and the recent deep learning methods tend to take up too many computing resources. These methods are not suitable for low light object detection. We propose an accurate low light object detection method based on pyramid networks. A low-resolution pyramid enhancing light network is adopted to lessen computing and memory consumption. A super-resolution network based on attention mechanism is designed before Efficientdet to improve the detection accuracy. Experiments on the10K RAW-RGB low light image dataset show the effectiveness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The Point feature and line feature have been widely used in visual SLAM(simultaneous localization and mapping) algorithm. But most of these methods assume that the environments are static, ignoring that there are often dynamic objects in real world, which can degrade the SLAM performance. In order to solve this problem, a line-expanded visual odometry is proposed. It calculates optical flow between two adjacent frames to identify and eliminate dynamic point features in dynamic objects, and use the rest of point features to find the collinear relationship to expand line features for visual SLAM algorithm based on point features. Final it use the rest of point features and line features to estimate the camera pose. The proposed method not only reduces the influence of dynamic objects, but also avoids the tracking failure caused by few point features. The experiments are carried out on a TUM dataset. Compared with state-of-the-art methods like ORB (oriented FAST and rotated BRIEF) method and ORB add optical flow method, the results demonstrate that the proposed method reduces the tracking error and improve the robustness and accuracy of visual odometry in dynamic environments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Video inpainting is a very challenging task. Directly using the image inpainting method to repair the damaged video leads to the inter-frame contents flicker due to temporal discontinuities. In this paper, we introduce spatial structure and temporal edge information guided video inpainting model to repair the missing regions in high-resolution video. The model uses a convolutional neural network with residual blocks to fix up the missing contents in intra-frame according to spatial structure. At the same time, temporal edge of reference frame is introduced in the temporal domain, which has a large guiding effect on improving the texture and reducing the inter-frame flicker. We train the model with regular and irregular masks on the YouTube high resolution video datasets, and the trained model is qualitatively and quantitatively evaluated on the test set, and the results show our method is superior to the previous methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Modern chromatic equipment has been widely used. With the popularity of color digital equipment, accurate acquisition of color has a wide range of uses. The colorimetric characterization of equipment is the basic link of color management system, and how to accurately convert the color of various color devices has become a basic problem. The color space of the camera depends on the equipment. Therefore, the colorimetric characterization of digital camera is an important method to improve the color reproduction of images, which is the basis of color conversion between various devices. In the process of camera colorimetric characterization, the traditional neural network method, such as Back propagation(BP) neural network, needs a large number of sample data to obtain a large number of data, and the processing is complex. Because of its fast training speed, small amount of data and small color difference, RBF neural network can be used to solve the problem of colorimetric characterization of digital camera. On the 140 color blocks, half is used as training data set and half as test data set. In the first part, RBF neural network is used to train the data set, and the second part is used to experiment with the traditional BP neural network under the same data set. The experimental results show that the average color difference of training samples is 1.79ΔΕ CMC(1:1), the average color difference of the test sample is 4.89 ΔΕ. Compared with the traditional polynomial fitting method and BP network, RBF neural network has smaller color difference in colorimetric characterization.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Virtual reality (VR) refers to a technology that allows people to experience the virtual world in an artificial environment. As one of the most important forms of VR media content, panoramic video can provide viewers with 360-degree free viewing angles. However, the acquisition, stitching, transmission and playback of panoramic video may damage the video quality and seriously affect the viewer's quality of experience. Therefore, how to improve the display quality and provide users with a better visual experience has become a hot topic in this field. When watching the videos, people pay attention to the salient areas, especially for the panoramic videos that people can choose the regions of interest freely. Considering this characteristic, the saliency information needs to be utilized when performing quality assessment. In this paper, we use two cascaded networks to calculate the quality score of panoramic video without reference video. First, the saliency prediction network is used to compute the saliency map of the image, and the patches with higher saliency are selected through the saliency map. In this way, we can exclude the areas in the panoramic image that have no positive effect on the quality assessment task. Then, we input the selected small salient patches into the quality assessment network for prediction, and obtain the final image quality score. Experimental results show that the proposed method can achieve more accurate quality scores for the panoramic videos compared with the state-of-the-art works due to its special network structure.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to analyze the polarization properties of biological tissues, we constructed a high-speed three-dimensional swept -source polarization sensitive optical coherence tomography (PS-OCT) imaging system with the axial scanning rate up to 200KHz. For automatically detecting and analyzing the polarization properties of peristaltic living tissues in real-time, an advanced measurement and control system is designed using laboratory virtual instrument technology. Based on the producer-consumer pattern, the system mainly includes the sample arm scanning galvanometer module, the interference data acquisition module, the interference data real-time processing and display module, the interference data storage module. Each module not only independently completes specific sub-functions but also interacts with others. In-vivo human finger and in-vitro pork tissue were imaged using the system. The experimental results show that the home-made system obtains 78 cross sectional B-scans per second, and can display two-dimensional OCT images and acquire threedimensional OCT data all in real-time. This robust measurement and control system has the characteristics of short design cycle, high flexibility and real-time monitoring, will significantly promote the development of PS-OCT in clinical applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For adaptive optics without wavefront detection, the wavefront control method based on deep learning is analyzed. The simulation model of adaptive optics is established,The far-field spot data collected by the photodetector is used as the input of the neural network model, and the Zernike mode coefficient is used as the output. The fully trained model can quickly and accurately recover and control the low-order wavefront. The simulation results show that convolution neural network can effectively extract image features, which is better than ordinary depth neural network model. For convolution network model, the larger the number of training sets, the smaller the value of loss function after convergence, and the higher the accuracy of the model. Compared with the traditional iterative optimization control method, the control method based on neural network model has obvious advantages in real-time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.