Since the adoption of VP9 by Netflix in 2016, royalty-free coding standards continued to gain prominence through the activities of the AOMedia consortium. AV1, the latest open source standard, is now widely supported. In the early years after standardisation, HDR video tends to be under served in open source encoders for a variety of reasons including the relatively small amount of true HDR content being broadcast and the challenges in RD optimisation with that material. AV1 codec optimisation has been ongoing since 2020 including consideration of the computational load. In this paper, we explore the idea of direct optimisation of the Lagrangian λ parameter used in the rate control of the encoders to estimate the optimal Rate-Distortion trade-off achievable for a High Dynamic Range signalled video clip. We show that by adjusting the Lagrange multiplier in the RD optimisation process on a frame-hierarchy basis, we are able to increase the Bjontegaard difference rate gains by more than 3.98× on average without visually affecting the quality.
In the past ten years there have been significant developments in optimization of transcoding parameters on a per-clip rather than per-genre basis. In our recent work we have presented per-clip optimization for the Lagrangian multiplier in Rate controlled compression, which yielded BD-Rate improvements of approximately 2% across a corpus of videos using HEVC. However, in a video streaming application, the focus is on optimizing the rate/distortion tradeoff at a particular bitrate and not on average across a range of performance. We observed in previous work that a particular multiplier might give BD rate improvements over a certain range of bitrates, but not the entire range. Using different parameters across the range would improve gains overall. Therefore here we present a framework for choosing the best Lagrangian multiplier on a per-operating point basis across a range of bitrates. In effect, we are trying to find the para-optimal gain across bitrate and distortion for a single clip. In the experiments presented we employ direct optimization techniques to estimate this Lagrangian parameter path approximately 2,000 video clips. The clips are primarily from the YouTube-UGC dataset. We optimize both for bitrate savings as well as distortion metrics (PSNR, SSIM).
Optimising the parameters of a video codec for a specific video clip has been shown to yield significant bitrate savings. In particular, per-clip optimisation of the Lagrangian multiplier in Rate controlled compression, has led to BD-Rate improvements of up to 20% using HEVC. Unfortunately, this was computationally expensive as it required multiple measurement of rate distortion curves which meant in excess of fifty video encodes were used to generate that level of savings. This work focuses on reducing the computational cost of repeated video encodes by using a lower resolution clip as a proxy. Features extracted from the low resolution clip are then used to learn the mapping to an optimal Lagrange Multiplier for the original resolution clip. In addition to reducing the computational cost and encode time by using lower resolution clips, we also investigate the use of older, but faster codecs such as H.264 to create proxies. This work shows the computational load is reduced by up to 22 times using 144p proxies, and more than 60% of the possible gain at the original resolution is achieved. Our tests are based on the YouTube UGC dataset, using the same computational platform; hence our results are based on a practical instance of the adaptive bitrate encoding problem. Further improvements are possible, by optimising the placement and sparsity of operating points required for the rate distortion curves. Our contribution is to improve the computational cost of per clip optimisation with the Lagrangian multiplier, while maintaining BD-Rate improvement.
Motion estimation is a key component of any modern video codec. Our understanding of motion and the estimation of motion from video has come a very long way since 2000. More than 135 different algorithms have been recently reviewed by Scharstein et al http://vision.middlebury.edu/flow/. These new algorithms differ markedly from Block Matching which has been the mainstay of video compression for some time. This paper presents comparisons of H.264 and MP4 compression using different motion estimation methods. In so doing we present as well methods for adapting pre-computed motion fields for use within a codec. We do not observe significant gains to be had with the methods chosen w.r.t. Rate Distortion tradeoffs but the results reflect a significantly more complex interrelationship between motion and compression than would be expected. There remains much more to be done to improve the coverage of this comparison to the emerging standards but these initial results show that there is value in these explorations.
Demosaicing remains a critical component of modern digital image processing, with a direct impact on image quality. Conventional demosaicing methods yield relatively poor results especially light-weight methods used for fast processing. Alternatively, recent works utilizing Deep Convolutional Neural Nets have significantly improved upon previous methods, increasing both quantitative and perceptual performance. This approach has seen significant reduction of artifacts but there still remains scope for meaningful improvement. To further this research, we investigate the use of alternate architectures and training parameters to reduce incurred errors, especially visually disturbing demosaicing artifacts such as moiré and provide an overview of current methods to better understand their expected performance. Our results show a U-NET style Network to outperform previous methods in quantitative and perceptual error and remain computationally efficient for use in GPU accelerated applications as an end-to-end demosaicing solution.
Temporal and spatial random variation of luminance in images, or
'flicker' is a typical degradation observed in archived film and
video. The underlying premise in typical flicker reduction algorithms is that each image must be corrected for a spatially varying gain and
offset. These parameters are estimated in the stationary region of the
image. Hence the performance of that algorithm depends crucially on the identification of stationary image regions. Position fluctuations are also a common artefact resulting in a random 'shake' of each film frame. For removing both, the key is to reject regions showing local motion or other outlier activity. Parameters are then estimated mostly on that part of the image undergoing the dominant motion. A new
algorithm that simultaneously deals with global motion estimation and
flicker is presented. The final process is based on a robust application of weighted least-squares, in which the weights also classify portions of the image as local or global. The paper presents results on severely degraded sequences showing evidence of both Flicker and random shake.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.