Video compression is complicated by degradation in User Generated Content (UGC). Preprocessing the data before encoding improves compression. However the impact of the preprocessor depends not only on the codec and the filter strength of the preprocessor being used but also on the target bitrate of the encode and the level of degradation. In this paper we present a framework for modelling this relationship and estimating the optimal filter strength for a particular codec/preprocessor/bitrate/degradation combination. We examine two preprocessors based on classical and DNN ideas, and two codecs AV1, VP9. We find that up to 2dB of quality gain can result from preprocessing at constant bitrate and our estimator is accurate enough to capture most of these gains.
Since the adoption of VP9 by Netflix in 2016, royalty-free coding standards continued to gain prominence through the activities of the AOMedia consortium. AV1, the latest open source standard, is now widely supported. In the early years after standardisation, HDR video tends to be under served in open source encoders for a variety of reasons including the relatively small amount of true HDR content being broadcast and the challenges in RD optimisation with that material. AV1 codec optimisation has been ongoing since 2020 including consideration of the computational load. In this paper, we explore the idea of direct optimisation of the Lagrangian λ parameter used in the rate control of the encoders to estimate the optimal Rate-Distortion trade-off achievable for a High Dynamic Range signalled video clip. We show that by adjusting the Lagrange multiplier in the RD optimisation process on a frame-hierarchy basis, we are able to increase the Bjontegaard difference rate gains by more than 3.98× on average without visually affecting the quality.
Recent advances have shown that latent representations of pre-trained Deep Convolutional Neural Networks (DCNNs) for classification can be modelled to generate scores that are well correlated with human perceptual judgement. In this paper we seek to extend the use of perceptually relevant losses in training a DCNN for video compression artefact removal. We will use internal representations of a pre-trained classification network as the basis of the loss functions. Specifically, the LPIPS metric and a perceptual discriminator will be responsible for low-level and high-level features respectively. The perceptual discriminator uses differing internal feature representations of the VGG network as its first stage of feature extraction. Initial results shows an increase in performance in perceptually based metrics such VMAF, LPIPS and BRISQUE, while showing a decrease in performance in PSNR.
The technology climate for video streaming has vastly changed during 2020. Since the pandemic, video traffic over the internet has increased dramatically. This has clearly put increased interest in the bitrate/quality tradeoff for video compression for applications in video streaming and real time video communications. As far as we know, the impact of different artefacts on that tradeoff has not previously been systematically evaluated. In this paper we propose a methodology for measuring the impact of various degradations (noise, grain, flicker, shake) in a video compression pipeline. We show that noise/grain has the largest impact on codec performance, but that the modern codecs are more robust to the artefact. In addition, we report on the impact of a denoising module deployed as a pre-processor and show that performance metrics change in the context of the pipeline. Denoising would benefit from being treated as part of the processing pipeline both in development and testing.
Traditional metrics for evaluating video quality do not completely capture the nuances of the Human Visual System (HVS), however they are simple to use for quantitatively optimizing parameters in enhancement or restoration. Modern Full-Reference Perceptual Visual Quality Metrics (PVQMs) such as the video multi-method assessment fusion (VMAF) function are more robust than traditional metrics in terms of the HVS, but they are generally complex and non-differentiable. This lack of differentiability means that they cannot be readily used in optimization scenarios for enhancement or restoration. In this paper we look at the formulation of a perceptually motivated restoration framework for video. We deploy this process in the context of denoising by training a spatio-temporal denoiser deep convultional neural network (DCNN). We design DCNNs as a differentiable proxy for both a spatial and temporal version of VMAF. These proxies are used as part of the proposed loss function in updating the weights of the spatio-temporal DCNNs. We use these proxies and traditional losses to propose a perceptually motivated loss function for video. Our results show that using the perceptual loss function as a fine tuning step yields a higher VMAF score and lower PSNR, when compared to the spatio-temporal network that is trained using the traditional mean squared error loss. Using the perceptual loss function for the entirety of training yields a lower VMAF and PSNR, but has visibly less noise in its output.
Today's video transcoding pipelines choose transcoding parameters based on rate-distortion curves, which mainly focus on the relative quality difference between original and transcoded videos. By investigating the recently released YouTube UGC dataset, we found that human subjects were more tolerant to changes in low quality videos than in high quality ones, which suggests that current transcoding frameworks can be further optimized by considering perceptual quality of the input. In this paper, an efficient machine learning metric is proposed to detect low quality inputs, whose bitrate can be further reduced without sacrificing perceptual quality. To evaluate the impact of our method on perceptual quality, we conducted a crowd-sourcing subjective experiment, and provided a methodology to evaluate statistical significance among different treatments. The results show that the proposed quality guided transcoding framework is able to reduce the average bitrate up to 5% with insignificant perceptual quality degradation.
HTTP-based video streaming techniques have now been widely deployed to deliver video streams over communication networks. With these techniques, a video player can dynamically select a video stream from a set of pre-encoded representations of the video source based on its available bandwidth and viewport size. The bitrates of the encoded representations thus determine the video quality presented to viewers and also the averaged streaming bitrate which is highly related to streaming cost for massive video streaming platforms. Our work minimizes the average streaming bitrate on a per-chunk basis by modeling the probability that a player observes a particular representation. Since popularity of videos is regional, this paper exploits a further optimization that uses regional statistics of client bandwidth and viewport instead of the global statistics. Simulation results demonstrate that using regional statistics reduces streaming cost for low-bandwidth regions while improving the delivered quality for high-bandwidth regions compared to a baseline configuration that uses global statistics.
KEYWORDS: Video, Video compression, Visualization, Video coding, Video processing, Visual compression, Molybdenum, Algorithm development, Semantic video, Control systems
The development of video quality metrics and perceptual video quality metrics has been a well established pursuit for more than 25 years. The body of work has been seen to be most relevant for improving the performance of visual compression algorithms. However, modeling the human perception of video with an algorithm of some sort is notoriously complicated. As a result the perceptual coding of video remains challenging and no standards have incorporated perceptual video quality metrics within their specification. In this paper we present the use of video metrics at the system level of a video processing pipeline. We show that it is possible to combine the artefact detection and correction process by posing the problem as a classification exercise. We also present the use of video metrics as part of a classical testing pipeline for software infrastructure, but here it is sensitive to the perceived quality in picture degradation.
Video quality metrics are essential for improved video processing algorithms. Common video quality metrics are simple averages of independently computed per frame spatial metrics, but human quality perception is not uniform across frames. In particular, the order of frames matter, as does content complexity and scene changes. In this work, we develop a video quality framework that comprehensively integrates both spatial and temporal metrics at three levels: frame, scene, and full video. We experimentally demonstrate improved correlation of spatial metrics with human evaluation as well a new well-correlated temporal metric (jerkiness) based on this framework.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.