Content-Adaptive Bitrate Ladder Estimation in High-Efficiency Video Coding Utilizing Spatiotemporal Resolutions
Abstract
:1. Introduction
2. Related Work
3. Database Development
- ffmpeg -s:v 3840:2160 -i Beauty_3840×2160.yuv -vf scale=1920:1080 Beauty_1920×1080.yuv
- ffmpeg -s:v 3840×2160 -itsscale 0.5 -i Beauty_3840×2160.yuv Beauty_3840x2160_60fps.yuv
- ffmpeg -s:v 3840×2160 -r 120 -i Beauty_3840×2160.yuv -b:v 700k -c:v libx265 Beauty_3840_700kb_120fps.265
- ffmpeg -i Beauty_3840_10000kb_120fps.265 -vf scale=3840:2160:flags=bilinear Beauty_3840_10000kb_120fps_bilin.265
- ffmpeg.exe -i “Beauty_3840_10000kb_120fps_bilin.265” -filter:v “minterpolate=‘fps=120:mi_mode=mci:mc_mode=aobmc:me_mode=bidir:me=epzs’“ “Beauty_3840_10000kb_120fps_bilin_mint.265”
4. Models for BR, TR, and SSIM Estimation
4.1. Data Augmentation Process
4.2. Bitrate Ladder Switching Point Prediction
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Bentaleb, A.; Taani, B.; Begen, A.C.; Timmerer, C.; Zimmermann, R. A Survey on Bitrate Adaptation Schemes for Streaming Media Over HTTP. IEEE Commun. Surv. Tutor. 2019, 21, 562–585. [Google Scholar] [CrossRef]
- Lebreton, P.; Yamagishi, K. Network and Content-Dependent Bitrate Ladder Estimation for Adaptive Bitrate Video Streaming. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 4205–4209. [Google Scholar] [CrossRef]
- Menon, V.V.; Zhu, J.; Rajendran, P.T.; Amirpour, H.; Callet, P.L.; Timmerer, C. Just Noticeable Difference-Aware Per-Scene Bitrate-Laddering for Adaptive Video Streaming. In Proceedings of the 2023 IEEE International Conference on Multimedia and Expo (ICME), Brisbane, Australia, 10–14 July 2023; pp. 1673–1678. [Google Scholar] [CrossRef]
- HTTP Live Streaming. Available online: https://developer.apple.com/streaming/ (accessed on 14 August 2024).
- Sodagar, I. The MPEG-DASH Standard for Multimedia Streaming Over the Internet. IEEE MultiMedia 2011, 18, 62–67. [Google Scholar] [CrossRef]
- Durbha, K.S.; Tmar, H.; Stejerean, C.; Katsavounidis, I.; Bovik, A.C. Bitrate Ladder Construction Using Visual Information Fidelity. In Proceedings of the 2024 Picture Coding Symposium (PCS), Taichung, Taiwan, 12–14 June 2024; pp. 1–4. [Google Scholar] [CrossRef]
- Ge, C.; Wang, N.; Foster, G.; Wilson, M. Toward QoE-Assured 4K Video-on-Demand Delivery through Mobile Edge Virtualization with Adaptive Prefetching. IEEE Trans. Multimedia 2017, 19, 2123–2135. [Google Scholar] [CrossRef]
- Madhusudana, P.C.; Yu, X.; Birkbeck, N.; Wang, Y.; Adsumilli, B.; Bovik, A.C. Subjective and Objective Quality Assessment of High Frame Rate Videos. IEEE Trans. Image Process. 2020, 29, 2593–2606. [Google Scholar] [CrossRef]
- Chen, J.Y.C.; Thropp, J.E. Review of Low Frame Rate Effects on Human Performance. IEEE Trans. Syst. Man Cybern. Part A 2007, 37, 789–798. [Google Scholar] [CrossRef]
- Huynh-Thu, Q.; Ghanbari, M. Impact of Jitter and Jerkiness on Perceived Video Quality. In Proceedings of the 2nd International Workshop on Video Processing for Consumer Electronics (VPQM 2006), Scottsdale, AZ, USA, 25–26 January 2006; pp. 1–6. [Google Scholar]
- Zhai, G.; Cai, J.; Lin, W.; Yang, X.; Zhang, W.; Etoh, M. Cross-Dimensional Perceptual Quality Assessment for Low Bit-Rate Videos. IEEE Trans. Multimedia 2008, 10, 1316–1324. [Google Scholar] [CrossRef]
- ITU-T, P. 910: Subjective Video Quality Assessment Methods for Multimedia Applications, Telecommunication Standardization Sector of ITU. Available online: https://www.itu.int/rec/T-REC-P.910-202310-I/en (accessed on 27 September 2024).
- Vlaović, J.; Žagar, D.; Rimac-Drlje, S.; Vranješ, M. Evaluation of Objective Video Quality Assessment Methods on Video Sequences with Different Spatial and Temporal Activity Encoded at Different Spatial Resolutions. Int. J. Electr. Comput. Eng. Syst. 2021, 12, 1–9. [Google Scholar] [CrossRef]
- Aaron, A.; Li, Z.; Manohara, M.; Cock, J.D.; Ronca, D. Per-Title Encode Optimization. Available online: https://medium.com/netflix-techblog/per-title-encode-optimization-7e99442b62a2 (accessed on 14 August 2024).
- Katsavounidis, I. Dynamic Optimizer—A Perceptual Video Encoding Optimization Framework. Available online: https://netflixtechblog.com/dynamic-optimizer-a-perceptual-video-encoding-optimization-framework-e19f1e3a277f (accessed on 14 August 2024).
- Katsenou, A.V.; Afonso, M.; Agrafiotis, D.; Bull, D.R. Predicting Video Rate-Distortion Curves Using Textural Features. In Proceedings of the 2016 Picture Coding Symposium (PCS 2016), Nuremberg, Germany, 4–7 December 2016; pp. 1–5. [Google Scholar] [CrossRef]
- Katsenou, A.V.; Afonso, M.; Bull, D.R. Study of Compression Statistics and Prediction of Rate-Distortion Curves for Video Texture. IEEE Trans. Multimedia 2022, 101, 116551. [Google Scholar] [CrossRef]
- Katsenou, A.V.; Sole, J.; Bull, D.R. Content-Gnostic Bitrate Ladder Prediction for Adaptive Video Streaming. In Proceedings of the 2019 Picture Coding Symposium (PCS 2019), Ningbo, China, 12–15 November 2019; pp. 1–5. [Google Scholar] [CrossRef]
- Katsenou, A.V.; Sole, J.; Bull, D.R. Efficient Bitrate Ladder Construction for Content-Optimized Adaptive Video Streaming. IEEE Open J. Signal Process. 2021, 2, 496–511. [Google Scholar] [CrossRef]
- Menon, V.V.; Amirpour, H.; Ghanbari, M.; Timmerer, C. Perceptually-Aware Per-Title Encoding for Adaptive Video Streaming. In Proceedings of the 2022 IEEE International Conference on Multimedia and Expo (ICME 2022), Taipei, Taiwan, 18–22 July 2022; pp. 1–6. [Google Scholar] [CrossRef]
- Katsenou, A.V.; Zhang, F.; Swanson, K.; Afonso, M.; Sole, J.; Bull, D.R. VMAF-Based Bitrate Ladder Estimation for Adaptive Streaming. In Proceedings of the 2021 Picture Coding Symposium (PCS), Bristol, UK, 29 June–2 July 2021; pp. 1–5. [Google Scholar] [CrossRef]
- Telili, A.; Hamidouche, W.; Fezza, S.A.; Morin, L. Benchmarking Learning-Based Bitrate Ladder Prediction Methods for Adaptive Video Streaming. In Proceedings of the 2022 Picture Coding Symposium (PCS), San Jose, CA, USA, 7–9 December 2022; pp. 325–329. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Li, F.-F. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef]
- Bitmovin. White Paper: Per Title Encoding. Available online: https://go.bitmovin.com/en/choosing-per-title-encoding-technology (accessed on 14 August 2024).
- Cambria. Feature: Source Adaptive Bitrate Ladder (SABL). Available online: https://www.aicox.com/wp-content/uploads/2019/07/CambriaFTC_SABL.pdf (accessed on 14 August 2024).
- Ozer, J. A Cloud Encoding Pricing Comparison. Available online: http://docs.hybrik.com/repo/cloud_pricing_comparison.pdf (accessed on 14 August 2024).
- Paul, S.; Norkin, A.; Bovik, A.C. Efficient Per-Shot Convex Hull Prediction by Recurrent Learning. arXiv 2022, arXiv:2206.04877. [Google Scholar]
- MUX. Instant Per-Title Encoding. Available online: https://mux.com/blog/instant-per-title-encoding/ (accessed on 14 August 2024).
- Kornblith, S.; Shlens, J.; Le, Q.V. Do Better ImageNet Models Transfer Better? In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2019), Long Beach, CA, USA, 16–20 June 2019; pp. 2656–2666. [Google Scholar] [CrossRef]
- Falahati, A.; Safavi, M.K.; Elahi, A.; Pakdaman, F.; Gabbouj, M. Efficient Bitrate Ladder Construction Using Transfer Learning and Spatio-Temporal Features. In Proceedings of the 2024 13th Iranian/3rd International Machine Vision and Image Processing Conference (MVIP), Tehran, Iran, 6–7 March 2024; pp. 1–7. [Google Scholar] [CrossRef]
- Amirpour, H.; Timmerer, C.; Ghanbari, M. PSTR: Per-Title Encoding Using Spatio-Temporal Resolutions. In Proceedings of the 2021 IEEE International Conference on Multimedia and Expo (ICME), Shenzhen, China, 5–9 July 2021; pp. 1–6. [Google Scholar] [CrossRef]
- Song, L.; Tang, X.; Zhang, W.; Yang, X.; Xia, P. The SJTU 4K Video Sequence Dataset. In Proceedings of the 5th International Workshop on Quality of Multimedia Experience (QoMEX 2013), Klagenfurt am Wörthersee, Austria, 3–5 July 2013; pp. 34–35. [Google Scholar] [CrossRef]
- FFmpeg. Available online: https://www.ffmpeg.org (accessed on 14 August 2024).
Spatial Resolution | 480 × 360 | 1280 × 720 | 1920 × 1080 | 3840 × 2160 |
---|---|---|---|---|
Bitrate [kbps] | 50 | 50 | 250 | 700 |
250 | 250 | 700 | 2400 | |
500 700 | 700 2400 | 2400 6200 | 6200 10,000 |
Unit | Initial Value | Stop Value | Target Value |
---|---|---|---|
Epoch | 0 | 17 | 1000 |
Performance (MSE) | 305 | 1.2 | 0 |
Gradient | 469 | 0.384 | 1 × 10−7 |
Mu | 0.001 | 0.01 | 1 × 1010 |
Validation checks | 0 | 6 | 6 |
Observations | MSE | R | |
---|---|---|---|
Training | 314 | 1.3763 | 0.9097 |
Validation | 67 | 1.2566 | 0.9363 |
Test | 67 | 1.3538 | 0.8989 |
Additional test | 64 | 22.8856 | 0.7913 |
Unit | Initial Value | Stop Value | Target Value |
---|---|---|---|
Epoch | 0 | 56 | 1000 |
Performance (MSE) | 7.15 × 103 | 133 | 0 |
Gradient | 1.64 × 104 | 294 | 1 × 10−7 |
Mu | 0.001 | 0.01 | 1 × 1010 |
Validation checks | 0 | 6 | 6 |
Observations | R | |
---|---|---|
Training | 1058 | 0.9147 |
Validation | 227 | 0.9011 |
Test | 227 | 0.9218 |
Additional test 1 | 19 | 0.7537 |
Additional test 2 | 22 | 0.7202 |
Unit | Initial Value | Stop Value | Target Value |
---|---|---|---|
Epoch | 0 | 71 | 1000 |
Performance | 4.34 × 107 | 5.51 × 105 | 0 |
Gradient | 6.74 × 107 | 8.84 × 104 | 1 × 10−6 |
Validation checks | 0 | 6 | 6 |
Observations | R | |
---|---|---|
Training | 1058 | 0.7482 |
Validation | 227 | 0.7827 |
Test | 227 | 0.7703 |
Additional test 1 | 19 | 0.8380 |
Additional test 2 | 22 | 0.8404 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Šuljug, J.; Rimac-Drlje, S. Content-Adaptive Bitrate Ladder Estimation in High-Efficiency Video Coding Utilizing Spatiotemporal Resolutions. Electronics 2024, 13, 4049. https://doi.org/10.3390/electronics13204049
Šuljug J, Rimac-Drlje S. Content-Adaptive Bitrate Ladder Estimation in High-Efficiency Video Coding Utilizing Spatiotemporal Resolutions. Electronics. 2024; 13(20):4049. https://doi.org/10.3390/electronics13204049
Chicago/Turabian StyleŠuljug, Jelena, and Snježana Rimac-Drlje. 2024. "Content-Adaptive Bitrate Ladder Estimation in High-Efficiency Video Coding Utilizing Spatiotemporal Resolutions" Electronics 13, no. 20: 4049. https://doi.org/10.3390/electronics13204049
APA StyleŠuljug, J., & Rimac-Drlje, S. (2024). Content-Adaptive Bitrate Ladder Estimation in High-Efficiency Video Coding Utilizing Spatiotemporal Resolutions. Electronics, 13(20), 4049. https://doi.org/10.3390/electronics13204049