Video compression is complicated by degradation in User Generated Content (UGC). Preprocessing the data before encoding improves compression. However the impact of the preprocessor depends not only on the codec and the filter strength of the preprocessor being used but also on the target bitrate of the encode and the level of degradation. In this paper we present a framework for modelling this relationship and estimating the optimal filter strength for a particular codec/preprocessor/bitrate/degradation combination. We examine two preprocessors based on classical and DNN ideas, and two codecs AV1, VP9. We find that up to 2dB of quality gain can result from preprocessing at constant bitrate and our estimator is accurate enough to capture most of these gains.
Since the adoption of VP9 by Netflix in 2016, royalty-free coding standards continued to gain prominence through the activities of the AOMedia consortium. AV1, the latest open source standard, is now widely supported. In the early years after standardisation, HDR video tends to be under served in open source encoders for a variety of reasons including the relatively small amount of true HDR content being broadcast and the challenges in RD optimisation with that material. AV1 codec optimisation has been ongoing since 2020 including consideration of the computational load. In this paper, we explore the idea of direct optimisation of the Lagrangian λ parameter used in the rate control of the encoders to estimate the optimal Rate-Distortion trade-off achievable for a High Dynamic Range signalled video clip. We show that by adjusting the Lagrange multiplier in the RD optimisation process on a frame-hierarchy basis, we are able to increase the Bjontegaard difference rate gains by more than 3.98× on average without visually affecting the quality.
Recent advances have shown that latent representations of pre-trained Deep Convolutional Neural Networks (DCNNs) for classification can be modelled to generate scores that are well correlated with human perceptual judgement. In this paper we seek to extend the use of perceptually relevant losses in training a DCNN for video compression artefact removal. We will use internal representations of a pre-trained classification network as the basis of the loss functions. Specifically, the LPIPS metric and a perceptual discriminator will be responsible for low-level and high-level features respectively. The perceptual discriminator uses differing internal feature representations of the VGG network as its first stage of feature extraction. Initial results shows an increase in performance in perceptually based metrics such VMAF, LPIPS and BRISQUE, while showing a decrease in performance in PSNR.
The technology climate for video streaming has vastly changed during 2020. Since the pandemic, video traffic over the internet has increased dramatically. This has clearly put increased interest in the bitrate/quality tradeoff for video compression for applications in video streaming and real time video communications. As far as we know, the impact of different artefacts on that tradeoff has not previously been systematically evaluated. In this paper we propose a methodology for measuring the impact of various degradations (noise, grain, flicker, shake) in a video compression pipeline. We show that noise/grain has the largest impact on codec performance, but that the modern codecs are more robust to the artefact. In addition, we report on the impact of a denoising module deployed as a pre-processor and show that performance metrics change in the context of the pipeline. Denoising would benefit from being treated as part of the processing pipeline both in development and testing.
Traditional metrics for evaluating video quality do not completely capture the nuances of the Human Visual System (HVS), however they are simple to use for quantitatively optimizing parameters in enhancement or restoration. Modern Full-Reference Perceptual Visual Quality Metrics (PVQMs) such as the video multi-method assessment fusion (VMAF) function are more robust than traditional metrics in terms of the HVS, but they are generally complex and non-differentiable. This lack of differentiability means that they cannot be readily used in optimization scenarios for enhancement or restoration. In this paper we look at the formulation of a perceptually motivated restoration framework for video. We deploy this process in the context of denoising by training a spatio-temporal denoiser deep convultional neural network (DCNN). We design DCNNs as a differentiable proxy for both a spatial and temporal version of VMAF. These proxies are used as part of the proposed loss function in updating the weights of the spatio-temporal DCNNs. We use these proxies and traditional losses to propose a perceptually motivated loss function for video. Our results show that using the perceptual loss function as a fine tuning step yields a higher VMAF score and lower PSNR, when compared to the spatio-temporal network that is trained using the traditional mean squared error loss. Using the perceptual loss function for the entirety of training yields a lower VMAF and PSNR, but has visibly less noise in its output.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.