Traditional metrics for evaluating video quality do not completely capture the nuances of the Human Visual System (HVS), however they are simple to use for quantitatively optimizing parameters in enhancement or restoration. Modern Full-Reference Perceptual Visual Quality Metrics (PVQMs) such as the video multi-method assessment fusion (VMAF) function are more robust than traditional metrics in terms of the HVS, but they are generally complex and non-differentiable. This lack of differentiability means that they cannot be readily used in optimization scenarios for enhancement or restoration. In this paper we look at the formulation of a perceptually motivated restoration framework for video. We deploy this process in the context of denoising by training a spatio-temporal denoiser deep convultional neural network (DCNN). We design DCNNs as a differentiable proxy for both a spatial and temporal version of VMAF. These proxies are used as part of the proposed loss function in updating the weights of the spatio-temporal DCNNs. We use these proxies and traditional losses to propose a perceptually motivated loss function for video. Our results show that using the perceptual loss function as a fine tuning step yields a higher VMAF score and lower PSNR, when compared to the spatio-temporal network that is trained using the traditional mean squared error loss. Using the perceptual loss function for the entirety of training yields a lower VMAF and PSNR, but has visibly less noise in its output.
In this paper it is investigated how conventional in-situ sensor networks can be complemented by the satellite data
streams available through numerous platforms orbiting the earth and the combined analyses products available through services such as MyOcean. Despite the numerous benefits associated with the use of satellite remote sensing data products, there are a number of limitations with their use in coastal zones. Here the ability of these data sources to provide contextual awareness, redundancy and increased efficiency to an in-situ sensor network is investigated. The
potential use of a variety of chlorophyll and SST data products as additional data sources in the SmartBay monitoring
network in Galway Bay, Ireland is analysed. The ultimate goal is to investigate the ability of these products to create a
smarter marine monitoring network with increased efficiency. Overall it was found that while care needs to be taken in
choosing these products, there was extremely promising performance from a number of these products that would be
suitable in the context of a number of applications especially in relation to SST. It was more difficult to come to
conclusive results for the chlorophyll analysis.
Remote sensing technology continues to play a significant role in the understanding of our environment and the
investigation of the Earth. Ocean color is the water hue due to the presence of tiny plants containing the pigment
chlorophyll, sediments, and colored dissolved organic material and so can provide valuable information on coastal
ecosystems. We propose to make the browsing of Ocean Color data more efficient for users by using image processing
techniques to extract useful information which can be accessible through browser searching. Image processing is applied
to chlorophyll and sea surface temperature images. The automatic image processing of the visual level 1 and level 2 data
allow us to investigate the occurrence of algal blooms. Images with colors in a certain range (red, orange etc.) are used to
address possible algal blooms and allow us to examine the seasonal variation of algal blooms in Europe (around Ireland
and in the Baltic Sea). Yearly seasonal variation of algal blooms in Europe based on image processing for smarter browsing of Ocean Color are presented.
This paper details and evaluates a system that aims to provide continuous robust localisation ('tracking') of vehicles
throughout the scenes of aerial video footage captured by Unmanned Aerial Vehicles (UAVs). The scientific field of
UAV object tracking is well studied in the field of computer vision, with a variety of solutions offered. However,
rigorous evaluation is infrequent, and further novelty lies here in our exploration of the benefits of combined modality
processing, in conjunction with a proposed adaptive feature weighting technique. Building on our previously reported
framework for object-tracking in multi-spectral video1, moving vehicles are initially located by exploiting their intrascene
displacement within a camera-motion compensated video-image domain. For each detected vehicle, a
spatiogram2-based representation is then extracted, which is a representative form that aims to bridge the gap between
the 'coarseness' of histograms and the 'rigidity' of pixel templates. Spatiogram-based region matching then ensues for
each vehicle, towards determining their new locations throughout the subsequent frames of the video sequence. The
framework is flexible in that, in addition to the exploitation of traditional visible spectrum features, it can accommodate
the inclusion of additional feature sources, demonstrated here via the attachment of an infrared channel. Furthermore, the
system provides the option of enabling an adaptive feature weighting mechanism, whereby the transient ability of certain
features to occasionally outperform others is exploited in an adaptive manner, to the envisaged benefit of increased
tracking robustness. The system was developed and tested using the DARPA VIVID2 video dataset3, which is a suite of
multi-spectral (visible and thermal infrared) video files captured from an airborne platform flying at various altitudes.
Evaluation of the system is quantitative, which differentiates it from a large portion of the existing literature, whilst the
results observed serve to further reveal the challenging nature of this problem.
Changes in sea surface temperature can be used as an indicator of water quality. In-situ sensors are being used for
continuous autonomous monitoring. However these sensors have limited spatial resolution as they are in effect single
point sensors. Satellite remote sensing can be used to provide better spatial coverage at good temporal scales. However
in-situ sensors have a richer temporal scale for a particular point of interest. Work carried out in Galway Bay has
combined data from multiple satellite sources and in-situ sensors and investigated the benefits and drawbacks of using
multiple sensing modalities for monitoring a marine location.
In this paper we provide an overview of a content-based retrieval (CBR) system that has been specifically designed
for handling UAV video and associated meta-data. Our emphasis in designing this system is on managing large
quantities of such information and providing intuitive and efficient access mechanisms to this content, rather than
on analysis of the video content. The retrieval unit in our system is termed a "trip". At capture time, each trip
consists of an MPEG-1 video stream and a set of time stamped GPS locations. An analysis process automatically
selects and associates GPS locations with the video timeline. The indexed trip is then stored in a shared trip
repository. The repository forms the backend of a MPEG-211 compliant Web 2.0 application for subsequent
querying, browsing, annotation and video playback. The system interface allows users to search/browse across
the entire archive of trips and, depending on their access rights, to annotate other users' trips with additional
information. Interaction with the CBR system is via a novel interactive map-based interface. This interface
supports content access by time, date, region of interest on the map, previously annotated specific locations of
interest and combinations of these. To develop such a system and investigate its practical usefulness in real
world scenarios, clearly a significant amount of appropriate data is required. In the absence of a large volume
of UAV data with which to work, we have simulated UAV-like data using GPS tagged video content captured
from moving vehicles.
In this paper we present the results of applying a general purpose feature combination framework for tracking
to the specific task of tracking vehicles in UAV data sets. In the fusion framework used (previously presented
elsewhere1) vehicles' pixel-based features from multiple channels, specifially RGB and thermal IR, are split across
separate individual spatiogram trackers. The use of spatiograms allows embedding of some spatial information
into the models whilst also avoiding the exponential increase in computational load and memory requirements
associated with the more commonly used histogram. This tracking framework is embedded in a complete system
for detecting and tracking vehicles. The system first carries out pre-processing to ensure spatially and temporally
aligned visible spectrum and IR data prior to tracking. Vehicle detection in the initial two frames is achieved
by first compensating for camera motion, followed by frame differencing and post-processing (thresholding and
size filtering) to identify vehicle regions. Each vehicle is then described by a bounding box and this is used to
generate a set of spatiograms for each of the available data channels. The detected vehicle is then tracked using
the spatiogram tracker framework. Results of experiments on a variety of UAV data sets indicate the promising
performance of the overall system, even in the presence of significant illumination variation, partial and full
occlusions and significant camera motion and focus change. Results are particularly encouraging given that we
do not periodically re-initialise the detection phase and this points to the robustness of the tracking framework.
The SenseCam is a prototype device from Microsoft that facilitates automatic capture of images of a person's
life by integrating a colour camera, storage media and multiple sensors into a small wearable device. However,
efficient search methods are required to reduce the user's burden of sifting through the thousands of images that
are captured per day. In this paper, we describe experiments using colour spatiogram and block-based cross-correlation
image features in conjunction with accelerometer sensor readings to cluster a day's worth of data into
meaningful events, allowing the user to quickly browse a day's captured images. Two different low-complexity
algorithms are detailed and evaluated for SenseCam image clustering.
Video retrieval is mostly based on using text from dialogue and this remains the most significant component, despite progress in other aspects. One problem with this is when a searcher wants to locate video based on what is appearing in the video rather than what is being spoken about. Alternatives such as automatically-detected features and image-based keyframe matching can be used, though these still need further improvement in quality.
One other modality for video retrieval is based on segmenting objects from video and allowing endusers to use these as part of querying. This uses similarity between query objects and objects from video, and in theory allows retrieval based on what is actually appearing on-screen. The main hurdles to greater use of this are the overhead of object segmentation on large amounts of video and the issue of whether we can actually achieve effective object-based retrieval.
We describe a system to support object-based video retrieval where a user selects example video objects as part of the query. During a search a user builds up a set of these which are matched against objects previously segmented from a video library. This match is based on MPEG-7 Dominant Colour, Shape Compaction and Texture Browsing descriptors. We use a user-driven semi-automated segmentation process to segment the video archive which is very accurate and is faster than conventional video annotation.
KEYWORDS: Logic, Multiplexers, Switching, Clocks, Very large scale integration, Video, Video processing, Multiplexing, Multimedia, Algorithm development
The explosive growth of the mobile multimedia industry has accentuated the need for efficient VLSI implementations of the associated computationally demanding signal processing algorithms. This need becomes greater
as end-users demand increasingly enhanced features and more advanced underpinning video analysis. One such
feature is object-based video processing as supported by MPEG-4 core profile, which allows content-based interactivity. MPEG-4 has many computationally demanding underlying algorithms, an example of which is the Shape Adaptive Discrete Cosine Transform (SA-DCT). The dynamic nature of the SA-DCT processing steps
pose significant VLSI implementation challenges and many of the previously proposed approaches use area and
power consumptive multipliers. Most also ignore the subtleties of the packing steps and manipulation of the
shape information. We propose a new multiplier-less serial datapath based solely on adders and multiplexers
to improve area and power. The adder cost is minimised by employing resource re-use methods. The number
of (physical) adders used has been derived using a common sub-expression elimination algorithm. Additional
energy efficiency is factored into the design by employing guarded evaluation and local clock gating. Our design
implements the SA-DCT packing with minimal switching using efficient addressing logic with a transpose memory RAM. The entire design has been synthesized using TSMC 0.09μm TCBN90LP technology yielding a gate
count of 12028 for the datapath and its control logic.
In this paper, an efficient tool to extract video objects from video sequences is presented. With this tool, it is possible to segment video content in a user-friendly manner to provide easy manipulation of video content. The tool is comprised of two stages. Firstly, the initial object extraction is performed using the Recursive Shortest Spanning Tree (RSST) algorithm and the Binary Partition Tree (BPT) technique. Secondly, automatic object tracking is performed using a single frame forward region tracking method. In the first stage, an initial partition is created using the RSST algorithm which allows the user to specify the initial number of regions. This process is followed by progressive binary merging of these regions to create the BPT. The purpose of creating the BPT is to allow the user to browse the content of the scene in a hierarchical manner. This merging step creates the binary tree with nearly double the user-specified number of homogenous regions. User interaction then allows grouping particular regions into objects. In the second stage, each subsequent frame is segmented using the RSST and corresponding regions are identified using a forward region tracking method.
KEYWORDS: Televisions, Video, Human-machine interfaces, Image segmentation, Video processing, Digital video recorders, Library classification systems, Visualization, Computing systems, Control systems
This paper describes the organizational and playback features of Fischlar, a digital video library that allows users o record, browse and watch television programs on- line. Programs that can be watched and recorded are organized by personal recommendations, genre classifications, name and other attributes for access by general television users. Motivations and interactions of users with on-line television libraries are outlined and they are also supported by personalized library access, categorized programs, a combined player browse with content viewing history and content marks. The combined player browser supports a user who watches a program on different occasions in a non-sequential order.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.