|
|
1.INTRODUCTIONIn recent years, the growth in delivery of video at scale for broadcast and streaming applications (from Netflix, YouTube, Disney etc.) has inspired further research into content-adaptive transcoding. The goal is to deliver high-quality content at progressively lower bitrates by adapting the transcoder for each input at a fine-grained level of control. In 2013, YouTube was the first to adopt this strategy for its User-Generated-Content (UGC) by building a pipeline that is based on clip popularity by re-processing a clip with an enhanced pre-processor in combination with a different built-in transcoder. Around the same time, Netflix’s seminal work on perclip and per-shot encoding4 for High Value Content (HVC) videos showed that an exhaustive search of the coding parameter space can lead to significant gains in Rate-Distortion (RD) tradeoffs per clip. These gains offset the high one-time computational cost of encoding as the same encoded clip may be streamed to millions of viewers across many different Content Delivery Networks (CDNs),5 thus effectively saving bandwidth and network resources. That idea has since been revisited and has become more efficient by applying the Viterbi algorithm across shots and parameter spaces.6 Over the past years a lot of researchers have focused on the optimisation of a high-level parameter (target bitrate or quantisation factor or objective quality) to generate an optimal bitrate ladder for a clip as part of a Adaptive Bitrate Streaming (ABR).7–9 In our previous work,10 we showed that the RD tradeoff can be directly addressed by applying a numerical optimisation scheme to estimate the appropriate Lagrangian λ multiplier for a given clip (see Section 2) for standard dynamic range (SDR) videos. We observed an average BD-rate improvement of 1.9% for HEVC, 1.3% for VP9,11 and 0.5% in AV1.12 In our latest work,12 we further demonstrated that additional BD-rate(%) gains, from 0.5% to 4.9%, for AV1 could be achieved by adopting a per frame-type optimisation. In this paper, we explore the idea of λ optimisation on High-Dynamic Range (HDR)/Wide Color Gamut (WCG) material.13 HDR/WCG systems can capture, process, and reproduce a scene conveying the full range of perceptible shadow and highlight details beyond normal dynamic range (SDR) video systems. Similarly to our latest work,12 we propose a content-adaptive transcoder optimisation at a global and a deeper frame-type level. The core new ideas are i) consideration of RD Optimisation (RDO) for HDR content, ii) optimisation of the RDO parameters on a frame-hierarchy basis, and iii) investigation of various convergence criteria that result in the minimisation of the computational load. Our experiments in Section 5 demonstrate that a frame-based tuning of video encoder can lead to an average gain of 1.63% of BD-rate (best recorded gain of 9.3%) compared to the standardised method. Moreover, the average gains per shot range between 0.58 and 3.43% for HDR video content in AV1. Section 2 gives an overview of previous research work and λ definition in rate control. Section 3 explains the proposed methodology as well as the multi-dimensional optimisation. Section 4 then details the experimental set-up including the test sequences, the keyframe combination selection, the framework implementation. Section 5 reports on the experimental findings. 2.BACKGROUNDThe work of Sullivan et al14 laid the foundations for optimising the RD tradeoff in modern video codecs. By taking a Lagrange multiplier approach the joint optimisation problem is posed as the minimisation of J = D + λR, where λ is the Lagrange multiplier controlling the tradeoff. This idea is the basis of the RDO process used especially in making mode decisions in modern codecs. The independent variable in this optimisation is usually qp, a quantiser step size. Increasing qp reduces rate R but increases distortion D. Also, a different choice of λ yields different R,D pairs. Different codecs devised different recipes to derive the optimal λ value for RDO through an empirical relationship with qp. In libaom-AV11 λ is empirically related to qi (Quantizer Index: ≈ qp * 4 in the AV1 codebase), as follows: where A is a constant depending on the frame type (3.2 ≤ A ≤ 3.3) and qdc = f (qi, A) is defined through a discrete valued Look Up Table (LUT) (0 ≤ qi ≤ 255 for AV1). This λ−qp relationship is not necessarily optimal for a particular clip because the empirical relationship was derived for optimality over an entire test corpus. To maximise gains, λ should be content dependent. Per Clip λ Optimisation. The idea of adapting λ based on video content is not entirely new. Zhang and Bull15 altered λ based on distortion statistics on a frame-basis for HEVC. In our previous work,10, 11, 16 we introduced the idea of an adaptive λ on a clip basis, using a single modified λ = kλo across all the frames in a clip. Here, λo represents the default value deployed from the relevant empirical relationship e.g. Eq. (1). In order to find the optimal λ value, we deployed numerical optimisers (Brent’s method and Golden-search17) that minimised the BD-rate as a cost function. Later work,10 considered the use of Machine Learning techniques to reduce the required computational load. Per Clip, Per Frame-Type λ Optimisation. In our latest work,12 we showed that this method of global λ tuning yields average BD-rate(%) gains of only 0.539% and 0.097% for AV1 and HEVC, respectively. These modest improvements are probably due to the fact that current modern video encoders are content-adaptive by nature and include many new improvements such as partition tools, Inter/Intra prediction tools and modern hierarchical reference frame-structure. Another important aspect to consider is that the heuristics and empirical shortcuts used in this content-adaptive implementation of the Lagrangian parameter, deviates from classical RDO theory, which normally requires λ to be constant across the sequences over which distortion is measured. For example in Eq.(1) λ changes with frame type. Motivated by this observation, in our latest work,12 we studied the effect of isolating the optimisation of λ for different frame types. Results on SDR sequences showed that optimising λ purely for Keyframes (KF), Golden-Frames (GF), Alternate-reference Frames (ARF), leads to average BD-rate gains of 4.92% compared to global λ optimisation (only 0.54% BR-Rate gains). HDR in AV1. The quality of compressed 4K and 8K HDR content was evaluated by Pourazad et. al.,18 drone content from P. Topiwala et.al.,19 and gaming (Nabajeet et. al.20). More relevant here is the work of Zhou et. al21 for HEVC that expresses distortion D in terms of the HDR-VDP-2 quality metric.22 They presented an algorithm for prediction of λ at the CTU level which resulted in 5% BD-rate improvement w.r.t. a reference implementation of HEVC (HM16.19). 3.DIRECT λ OPTIMISATION IN AV1As noted in Section 2, the encoder is determining λ independently on a frame basis inside the encoder. In this work, we explore the impact of treating λ optimisation as a multi variable search problem at a frame basis w.r.t. HDR content. The following sections expand our main strategies for the direct optimisation of the λ parameter. BD-rate Optimisation. For the nth RDO decision in clip m, we propose that λn = knλ0. We estimate k = [k1, …, kN] (where we assume N RDO decisions) to maximise the BD-rate gain using MS-SSIM23 as the quality metric (Qm). The cost function Cm(k) can therefore be formulated as: where Rm(kλo, Q) is the bitrate of the mth clip at quality Q, using λ = kλo for the N RDO decisions and Q1,Q2 defined as usual.3 Rm(·, ·) is derived from the MS-SSIM-based RD curve generated using P qp measurements. Here we use P = 5: qp ={27, 39, 49, 59, 63}. The flow of the optimisation framework is reported in Algorithm 1. We can see that repeated computations of the BD-rate are required, and this incurs a huge computational cost. To address this, we deploy the idea of proxies for parameter selection as proposed by Ping H. et. al.24, 25 They observe that using different speed settings of the encoder at the same target quality/quantizer level results primarily in bitrate differences that are directly proportional to the content complexity (see section 5.1). That idea also extends to the use of lower resolution proxies. Therefore we can reduce computational load by performing optimisation using faster encoder presets ‘em and lower resolution proxies. Multi-Dimensional Optimiser. Our previous study for finding the optimal λ multiplier in AV1 used Brent’s line search method.12 The focus was on applying a modifier k to only one sub-module in the encoder. However, here we explore the scenario of a multiple dimensional search for finding the optimal λ for multiple frame types each associated with a different k. For this multi-dimensional search, we have various options like Nelder-Mead Simplex,26 Conjugate Gradient,27 Powell’s method,28 etc. In order to select a suitable optimiser, we conducted a simple analysis by carrying out an exhaustive grid-search on a single video to study the surface of our objective function. For this study, we chose the clip NocturneRoom from the AOM Common-Testing-Configuration (AOM-CTC)29 set, and optimised λ for two different frametypes: λKF for the KFs and λGF/ARF, for the GF/ARFs. For all other frame-types, λ was set to the default. The grid search range for both λKF and λGF/ARF is 0.6 to 5.4, in steps of 0.1. This provided 2,401 anchor points resulting in a total of 12,005 RD points for analysis. Figure 1 shows the contour plot of the BD-rate % (MS-SSIM) objective function for a sample clip. The surface is clearly smooth, and a gradient based method is expected to converge to a sub-optimal solution (local minimum) due to very low gradient. This observation was confirmed by testing gradient based methods such as the Nelder-Mead Simplex method26 and the Conjugate Gradient.27 Both of these methods converged erroneously after the first iteration as the gradient was very close to zero. Therefore, we explored line-search constrained methods. One of the best performing was the modified Powell method,28 which succeeded to reach the global minima for the test clip (see red-lines on the Figure 1). 4.EXPERIMENTAL SETUP4.14K HDR CorpusFor our experimental studies, we formed a video corpus consisting of 50 video clips (6500 frames) curated from various public sources. All the videos are normalized to BT.2020 color primaries with SMPTE2084 Perceptual Quantizer (PQ) transfer function and represented in the YUV colourspace inside YUV-Y4M containers. All the conversions and normalization of the clips are implemented with HDRTools.30 The configuration file for the conversion to YUV Space with PQ Signal is available in our project page∗. The 50 clips contain 130 frames, a resolution of 3840x2160/4096x2160, and can be further grouped into 7 shot groups. Figure 2 illustrates sample frames of the dataset and Table 1 gives a short description of the content of these 7 shot groups. More information on the dataset, including computed Spatial and Temporal Information (SI and TI)31 and Dynamic Range (DR) can be found in our project page∗. Table 1:High-level description of the shots.
4.2Keyframe SelectionReference frames (henceforth called keyframes) in AV1, typically contain 5 to 10 times more bits than other frames. Therefore, we target the optimisation of the bit allocation of these keyframes. In AV1, in order to code an Inter frame, references up to 8 keyframes35 are used. The encoder chooses from multiple frames in both forward and backward direction.36 For the simulations presented, we consider multiple combinations of the 3 keyframe types: the reference Intra-coded frame KF, an ARF_FRAME (ARF) used in prediction but does not appear in the display, and an Inter coded frame which is coded at higher quality GOLDEN_FRAME (GF). 4.3Framework ImplementationFor the simulations, Random-Access (RA) encoding mode was chosen as per AOM-CTC.29 This mode is commonly used for streaming as it allows users to randomly seek into any frame of the clip. We deployed a stable release for AV1 (libaom-av1-3.2.0, 287164d) with modifications to allow k to propagate to the desired mode from a command-line argument. The objective metrics for Quality and Rate (RD measurements) at the selected qp settings were computed using libvmaf,37 a standard open-source video quality evaluation library. Our software framework for performing these experiments with AV1 is based on AreWeCompressedYet.38 5.EXPERIMENTS & RESULTS5.1Proxy ProcessingAs discussed earlier, optimisation requires many encodes, and we need to consider faster proxy presets to make these experiments practical. We have investigated three proxy modes for use during the optimisation: (4K S2-S2), the default non-proxy encode at the original 4K resolution using AV1 Speed 2 preset as in the final setting, (4K S2-S6), which encodes videos at 4K resolution using AV1 Speed 6 preset, and (1080p S2-S6), which operates at a (Lanczos 5) downsampled 1080p resolution with speed preset 6. Table 2 presents the BD-rate gains on the 4 sequences of the av2-g1-hdr-4k set from AOM-CTC. The optimisation is performed in this case for a single global k using Brent’s method.17 It is clear that using the proxy method (1080p S2-S6) reduces the encoding complexity by an average of 4.8×, with negligible degradation in quality (BD-rate) when compared against full-resolution optimisation mode (4K S2-S2). We also note that BD-rate gains at (4K S2-S6) are almost identical to (1080p S2-S6), but with about 30% slower encoding speed. Given that processing time for our subsequent experiments takes in the order of 100’s of hours, we therefore use this (1080p S2-S6) proxy setting in the rest of the study. Hence in results reported in Table 3 we use that proxy to estimate optimal values of k1, …, kn; then evaluate BD-rate gains using the original material i.e. 4K with S2 preset. We observe of course some differences between this approach and using 4K S2 throughout but it is the more pragmatic approach validated by our results in Table 2. Table 2:Proxy encoding time(hours), estimated λ multiplier value, and BD-rate gains (%) MS-SSIM for the different proxy encoding speeds and video resolutions.
Table 3:Per-frame-type λ-optimisation results for multiple combination of frame types. k values are obtained using (1080p S6) proxy settings. BD-rates (BDR) are calculated using (4K S2) as an anchor.
5.2Multiple Frame Types λ Optimisation PerformanceOne of the objectives of this work is to compare different combinations of frame-type dependent λ optimisations with the ultimate aim to find the best λ optimisation strategy. First, we considered 4 modes, as in our previous work.12 Particularly, we set λ = kλo for some grouped frame types, and kept λ = λo for all other frame types. These four modes are: i) All Frames, which refers to the global λ tuning method where we set the same multiplier k for all frames, as previously proposed by Ringis et al.,11 ii) KF, which refers to the scenario where we set λ = kλo tune for KF, iii) GF-ARF, which refers to λ optimisation for GF and ARF frames, iv) KF-GF-ARF, which refers to optimising λ for KF, GF, and ARF frames. In addition to the above four modes, we introduced a new search method Powell [KF-GF/ARF], which is a multidimensional joint search, where we deploy two k values, with k1 for KF and k2 for GF and ARF frames. Thus, λKF = k1λo and λGF/ARF = k2λo, and for others we use the default λ value. Table 3 reports the BD-rate (MS-SSIM %), MS-SSIM (dB), VMAF39 gains for these different optimisation strategies for the whole corpus which is divided into 7 different shots groups (see Table 1). The underlined result shows the highest gains in terms of BD-rate that the proposed tuning brings compared to (4K S2). The results presented are averaged on a shot group basis. Also, the minimum and average BD-rate in the shot group are recorded. Another perspective of the resulting BD-rates from all tested clips is illustrated in the histograms of the BD-rate values in Figure 3. First, we observe that the new values of k are, on average, significantly different from k = 1, hence verifying our initial hypothesis of a “better” λ value. Inspecting the Brent optimiser results from both Table 3 and Figure 3, it transpires that the KF-GF-ARF method is achieving the best BD-rate gains on average (1.12%). Analysing the results on per-shot basis, we were able to see significant improvements in BD-rates for certain shots. Best average gains recorded were for Sol-Levante, where the BD-rate improved from 0.79% to 2.94%, and this was followed by Cosmos with an improvement from 1.44% to 2.74%. Another important observation is that the multidimensional search method Powell [KF, GF/ARF], appears to be the overall top performer, with average BD-rate gains of 1.63%. The histogram in Figure 3e also shows that the improvement is consistent for the majority of the videos. This also evident in terms of bitrate, as we achieve a bitrate reduction of 0.64% with KF, GF, ARF and 4.25% with Powell [KF, GF/ARF] at QP39. We also observed that for this improved bitrate savings, we have a minimal loss of MS-SSIM with 0.06dB and 0.35 for VMAF on average. From Table 3, we notice that clips within the same shot group respond differently. These deviations can be partly explained by variations in spatial and temporal information (See project page∗). Analysing the distribution of results obtained with the Powell method, we observed that the higher BD-rate gains (> 4%) were acquired for clips with high spatial information/complexity (5 clips). Videos with lower temporal information, exhibited higher gains. Videos with low temporal and low spatial complexity had no improvements. Different optimisation modes exhibited very different distributions, for instance, in the classic tuning method, the BD-rate improvements were noticed with clips with SI between 400-500 while for tuning mode of λ for KF/GF gave large loss for the same range of videos. 5.3Convergence Speed of Optimisation MethodsThe Powell method is computationally expensive, with a roughly twice computational cost compared to the Brent optimisation. Practically this means 203 hours on average over 118 hours per clip. Furthermore, the Brent search for optimal k1 and k2 determinations required an average of 15 hours, while Powell method’s joint search for (k1, k2) needed around 87 hours. The final optimisation step, i.e. the non-proxy encode at slower speed preset, took around 107 hours per clip on average for all three modes. It is interesting that for the Powell method and for most of the clips, we were able to achieve results very close to the final iteration by only using one iteration of the algorithm. With respect to the BD-RATE at iteration 1, the BD-rate at the final iteration (Powell method) was only on average 9% different with a median of 2.4%. Hence in practice, just one iteration of Powell is good enough and certainly for at least 50% of the clips. Taking all the above into consideration, we can reduce the total computational cost by half just by reducing the iterations of Powell search. 5.4HDR vs. SDR λ OptimisationIt is educational to consider whether there is any difference in gains between equivalent HDR and SDR material. We therefore curated a subset of 39 clips from the current dataset, for which a SDR version of the sequence was released along with the HDR data from the content producer. Table 4 presents these SDR/HDR results for these particular subset sequences. The anchor for the BD-rate computations is the (4K S2) case. Overall, the average gains for HDR and SDR are comparable, with average BD-rate gains of 2.47% for SDR vs. 1.89% for HDR. Comparing directly the distributions of BD-rate between SDR and HDR, we notice that for 82% of the clips we have very similar BD-rate gains (±1%). Table 4:λ optimisation result for same set sequences represented in SDR and HDR domain, where the k values are obtained using (1080p S6) proxy settings. BD-rates (BDR) are calculated using (4K S2).
We need to note that this comparison between HDR and SDR must however be nuanced by the fact that the same distortion metric (MS-SSIM) is applied in both HDR and SDR. Using the same distortion metric for both allows us to make a direct comparison, but MS-SSIM has not been designed for HDR. To the best of our knowledge, there is no objective quality metric available at the moment that could facilitate a direct comparison of SDR and HDR. Nevertheless the point is here that HDR optimisation and SDR optimisation is quite different. In particular Table 4 shows that the average estimated value of λ is quite different for SDR and HDR. 6.CONCLUSIONWe have presented a new method of per-clip optimisation based on the rate control λ multiplier in AV1 for different frame types. The proposed method was tested on a 4K-HDR corpus of 50 videos. We reported improved average BD-rate gains of 1.6% for the proposed per-frame-type per-clip optimisation compared to the 0.4% for the global per-clip λ-optimisation. The proposed method showed improvements up to 3.4% on average for certain shots compared to the method of global λ tuning. The best improvements in BD-rate range from 2.1% to 9.3% with our proposed method. We have also showed that the computational complexity of the optimisation process can be mitigated by employing proxy settings and restricting the number of iterations used by the Powell optimiser. Thus, our proposed method of multivariable Powell-based optimiser gave the best improvements on average. Future work will focus on exploring the current implementation of deriving λ from the quantiser along with more in-depth analysis of the quality aspect of the results including a subjective study. ACKNOWLEDGMENTSThis project is funded under Disruptive Technology Innovation Fund, Enterprise Ireland, Grant No DT-2019-0068, and ADAPT-SFI Research Center, Ireland. REFERENCESChen, Y., Murherjee, D., Han, J., Grange, A., Xu, Y., Liu, Z., Parker, S., Chen, C., Su, H., Joshi, U., Chiang, C.-H., Wang, Y., Wilkins, P., Bankoski, J., Trudeau, L., Egge, N., Valin, J.-M., Davies, T., Midtskogen, S., Norkin, A., and de Rivaz, P.,
“An overview of core coding tools in the av1 video codec,”
in Picture Coding Symposium (PCS),
41
–45
(2018). Google Scholar
Wu, P.-H., Katsavounidis, I., Lei, Z., Ronca, D., Tmar, H., Abdelkafi, O., Cheung, C., Amara, F. B., and Kossentini, F.,
“Towards much better svt-av1 quality-cycles tradeoffs for vod applications,”
Applications of Digital Image Processing XLIV, 11842 236
–256 SPIE(2021). Google Scholar
Bjontegaard, G.,
“Calculation of average PSNR differences between RD curves; VCEG-M33,”
ITU-T SG16/Q6,
(2001). Google Scholar
Aaron, A., Li, Z., Manohara, M., De Cock, J., and Ronca, D.,
“Netflix Technology Blog - Per-Title Encode Optimization,”
(2019) https://medium.com/netflix-techblog/per-title-encode-optimization-7e99442b62a2 Google Scholar
Conklin, G. J., Greenbaum, G. S., Lillevold, K. O., Lippman, A. F., and Reznik, Y. A.,
“Video coding for streaming media delivery on the internet,”
IEEE Trans. on Circuits and Systems for Video Technology,
(2001). https://doi.org/10.1109/76.911155 Google Scholar
Katsavounidis, I. and Guo, L.,
“Video codec comparison using the dynamic optimizer framework,”
Applications of Digital Image Processing XLI, 10752 SPIE(2018). https://doi.org/10.1117/12.2322118 Google Scholar
Reznik, Y. A., Lillevold, K. O., Jagannath, A., Greer, J., and Corley, J.,
“Optimal design of encoding profiles for abr streaming,”
in Proceedings of the 23rd Packet Video Workshop,
43
–47
(2018). Google Scholar
Bentaleb, A., Taani, B., Begen, A. C., Timmerer, C., and Zimmermann, R.,
“A Survey on Bitrate Adaptation Schemes for Streaming Media Over HTTP,”
IEEE Communications Surveys Tutorials, 21
(1),
(2019). https://doi.org/10.1109/COMST.2018.2862938 Google Scholar
Katsenou, A. V., Sole, J., and Bull, D. R.,
“Efficient Bitrate Ladder Construction for Content-Optimized Adaptive Video Streaming,”
IEEE Open Journal of Signal Processing, 2 496
–511
(2021). https://doi.org/10.1109/OJSP.2021.3086691 Google Scholar
Ringis, D. J., Pitié, F., and Kokaram, A.,
“Near optimal per-clip lagrangian multiplier prediction in hevc,”
in 2021 Picture Coding Symposium (PCS),
1
–5
(2021). Google Scholar
Ringis, D. J., Pitié, F., and Kokaram, A.,
“Per-clip adaptive lagrangian multiplier optimisation with low-resolution proxies,”
in International Society for Optics and Photonics,
115100E
(2020). Google Scholar
Vibhoothi, Pitié, F., and Kokaram, A.,
“Frame-type Sensitive RDO Control for Content-Adaptive-encoding,”
ArxiV,
(2022). Google Scholar
ITU-R, R.,
“BT2100-2: image parameter values for high dynamic range television for use in production and international programme exchange,”
(2018). Google Scholar
Sullivan, G. J. and Wiegand, T.,
“Rate-distortion optimization for video compression,”
IEEE signal processing magazine, 15
(6), 74
–90
(1998). https://doi.org/10.1109/79.733497 Google Scholar
Zhang, F. and Bull, D. R.,
“Rate-distortion optimization using adaptive lagrange multipliers,”
IEEE Trans. on Circuits and Systems for Video Technology, 29
(10), 3121
–3131
(2019). https://doi.org/10.1109/TCSVT.76 Google Scholar
Ringis, D. J., Pitie, F., and Kokaram, A.,
“Per clip lagrangian multiplier optimisation for (HEVC),”
Electronic Imaging, 2020
(12),
(2020). Google Scholar
Flannery, B. P., Press, W. H., Teukolsky, S. A., and Vetterling, W.,
“Numerical recipes in c,”
24 78 Press Syndicate of the University of Cambridge, New York
(1992). Google Scholar
Pourazad, M. T., Sung, T., Hu, H., Wang, S., Tohidypour, H. R., Wang, Y., Nasiopoulos, P., and Leung, V. C.,
“Comparison of Emerging Video Compression Schemes for Efficient Transmission of 4K and 8K HDR Video,”
in IEEE International Mediterranean Conference on Communications and Networking,
(2021). https://doi.org/10.1109/MeditCom49071.2021.9647504 Google Scholar
Topiwala, P. and Dai, W.,
“HDR video coding for aerial videos with VVC and AV1,”
in International Society for Optics and Photonics,
118420J
(2021). Google Scholar
Barman, N. and Martini, M. G.,
“User generated hdr gaming video streaming: dataset, codec comparison and challenges,”
IEEE Trans. on Circuits and Systems for Video Technology,
(2021). Google Scholar
Zhou, M., Wei, X., Wang, S., Kwong, S., Fong, C.-K., Wong, P. H. W., and Yuen, W. Y. F.,
“Global Rate-Distortion Optimization-Based Rate Control for HEVC HDR Coding,”
IEEE Trans. on Circuits and Systems for Video Technology, 30
(12), 4648
–4662
(2020). https://doi.org/10.1109/TCSVT.76 Google Scholar
Mantiuk, R., Kim, K. J., Rempel, A. G., and Heidrich, W.,
“HDR-VDP-2: A calibrated visual metric for visibility and quality predictions in all luminance conditions,”
ACM Trans. on Graphics, ACM, New York, NY, USA
(2011). https://doi.org/10.1145/2010324.1964935 Google Scholar
Wang, Z., Simoncelli, E., and Bovik, A.,
“Multiscale structural similarity for image quality assessment,”
in 37th Asilomar Conference on Signals, Systems Computers, 2003,
1398
–1402
(2003). Google Scholar
Wu, P.-H., Kondratenko, V., and Katsavounidis, I.,
“Fast encoding parameter selection for convex hull video encoding,”
Applications of Digital Image Processing XLIII, 11510 181
–194 SPIE(2020). Google Scholar
Wu, P.-H., Kondratenko, V., Chaudhari, G., and Katsavounidis, I.,
“Encoding parameters prediction for convex hull video encoding,”
in 2021 Picture Coding Symposium (PCS),
1
–5
(2021). Google Scholar
Gao, F. and Han, L.,
“Implementing the nelder-mead simplex algorithm with adaptive parameters,”
Computational Optimization and Applications, 51
(1), 259
–277
(2012). https://doi.org/10.1007/s10589-010-9329-3 Google Scholar
Nocedal, J. and Wright, S. J.,
“Conjugate gradient methods,”
Numerical optimization, 101
–134
(2006). https://doi.org/10.1007/978-0-387-40065-5 Google Scholar
Powell, M. J.,
“An efficient method for finding the minimum of a function of several variables without calculating derivatives,”
The computer journal, 7
(2), 155
–162
(1964). https://doi.org/10.1093/comjnl/7.2.155 Google Scholar
Xin, Z., Zhijun(Ryan), L., Andrey, N., Thomas, D., and Alexis, T.,
“AOM Common Test Conditions v2.0,”
Alliance for Open Media, Codec Working Group Output Document CWG/B075o,
(2021) https://aomedia.org/docs/CWG-B075o_AV2_CTC_v2.pdf Google Scholar
, “ITU-T and ISO/IEC, HDRTools pacakge [Online],”
(2015) https:gitlab.com/standards/HDRTools Google Scholar
Recommendation, I.,
“Subjective video quality assessment methods for multimedia applications,”
ITU-T, 910
(2021). Google Scholar
, “Netflix, Netflix open content,”
https://opencontent.netflix.com/ Google Scholar
Josef, A., Olof, L., Marcus, L., and Fredrik, L.,
“SVT OpenContent Video Test Suite 2022– Natural Complexity,”
Sveriges Television AB,
(2022) https://www.svt.se/open/en/content/ Google Scholar
, “Cable Television Laboratories, I., 4K Video Set,”
(2014) https://www.cablelabs.com/4k Google Scholar
Liu, Z., Mukherjee, D., Lin, W.-T., Wilkins, P., Han, J., and Xu, Y.,
“Adaptive multi-reference prediction using a symmetric framework,”
Electronic Imaging, 2017
(2), 65
–72
(2017). https://doi.org/10.2352/ISSN.2470-1173.2017.2.VIPC-409 Google Scholar
Chen, C., Han, J., and Xu, Y.,
“A hybrid weighted compound motion compensated prediction for video compression,”
in Picture Coding Symposium (PCS),
223
–227
(2018). Google Scholar
, “Netflix, VMAF - Video Multi-Method Assessment Fusion,”
(2016) https://github.com/Netflix/vmaf Google Scholar
Xiph Org, F.,
“AreWe Compressed Yet, AWCY Source[Online],”
(2015) https://github.com/xiph/awcy Google Scholar
Lin, J. Y., Liu, T.-J., Wu, E. C.-H., and Kuo, C.-C. J.,
“A fusion-based video quality assessment (FVQA) index,”
in Signal and Information Processing Association Annual Summit and Conference (APSIPA),
(2014). https://doi.org/10.1109/APSIPA.2014.7041705 Google Scholar
|