Block Partitioning Information-Based CNN Post-Filtering for EVC Baseline Profile
Abstract
:1. Introduction
- (1)
- A CNN-based post-filter for the EVC Baseline profile was developed, offering a promising video coding solution for IoT devices.
- (2)
- An analysis of the major artifacts in the EVC Baseline profile was conducted, and a method indicating the area where these artifacts appear was exploited.
- (3)
- The incorporation of a guide map based on blocking partitioning information was implemented to identify attention areas and enhance visual quality in the target image and video.
- (4)
- Consideration was given to IoT applications with low complexity, allowing IoT devices to selectively add the post-filter based on the available extra computing power.
- (5)
- A scenario-based CNN-based post-processing network was developed for real IoT applications, whether in image-based or real-time broadcasting/streaming services.
2. Related Work
2.1. Overview of EVC Baseline Profile
2.2. CNN-Based Filtering Technologies for Video Coding
2.3. Neural Network-Based Video Coding
3. CNN-Based Post-Filtering with Block Partitioning Information
3.1. Analysis of Coding Artifacts
3.2. Architecture and Network
3.3. Training
4. Experimental Results and Discussion
4.1. Objective Testing Result
4.2. Subjective Testing Result
4.3. Discussion
4.4. Future Work
5. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhang, C. The why, what, and how of immersive experience. IEEE Access 2020, 8, 90878–90888. [Google Scholar] [CrossRef]
- Cisco Visual Networking Index: Forecast and Methodology, 2016–2021. Available online: http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/complete-white-paper-c11-481360.html (accessed on 10 October 2023).
- Augmented Reality and Virtual Reality Market with COVID-19 Impact Analysis by Offering (Hardware & Software), Device Type (HMD, HUD, Gesture Tracking), Application (Enterprise, Consumer, Commercial, Healthcare), and Geography—Global Forecast to 2025. Available online: https://www.marketsandmarkets.com/Market-Reports/augmented-reality-virtual-reality-market-1185.html (accessed on 10 October 2023).
- Mukhopadhyay, S.C.; Tyagi, S.K.S.; Suryadevara, N.K.; Piuri, V.; Scotti, F.; Zeadally, S. Artificial intelligence-based sensors for next generation IoT applications: A review. IEEE Sens. J. 2021, 21, 24920–24932. [Google Scholar] [CrossRef]
- Chen, C.W. Internet of video things: Next-generation IoT with visual sensors. IEEE Internet Things J. 2020, 7, 6676–6685. [Google Scholar] [CrossRef]
- ISO/IEC 13818-2 MPEG-2 and ITU-T Recommendation H.262; Generic Coding of Moving Pictures and Associated Audio Information—Part 2: Video. International Telecommunication Union: Geneva, Switzerland, 1994.
- ITU-T Recommendation H.264 and ISO/IEC 14496-10; Advanced Video Coding (AVC). International Telecommunication Union: Geneva, Switzerland, 2003.
- ITU-T Recommendation H.265 and ISO/IEC 23008-2; High Efficient Video Coding (HEVC). International Telecommunication Union: Geneva, Switzerland, 2013.
- ITU-T Recommendation H.266 and ISO/IEC 23090-3; Versatile Video Coding (VVC). International Telecommunication Union: Geneva, Switzerland, 2020.
- ISO/IEC 23094-1:2020; Information Technology—General Video Coding—Part 1: Essential Video Coding. International Organization for Standardization: Geneva, Switzerland, 2020.
- Choi, K.P.; Choi, K.; Park, M.W.; Park, M.; Piao, Y.; Choi, M.; Yang, H.; Park, Y. Overview of baseline profile in MPEG-5 essential video coding standard. In Proceedings of the Applications of Digital Image Processing XLIV, San Diego, CA, USA, 1–5 August 2021; pp. 127–134. [Google Scholar]
- International Telecommunication Union. Information technology-digital compression and coding of continuous-tone still images-requirements and guidelines. CCITT Recomm. 1993, 81, 09. [Google Scholar]
- Bjontegaard, G. Response to Call for Proposals for H.26L. ITU-T SG16 Doc. Q15-F-11. In Proceedings of the International Telecommunication Union, Sixth Meeting, Seoul, Republic of Korea, November 1998. [Google Scholar]
- ISO/IEC JTC 1/SC 29/WG 04 N0047; Report on Essential Video Coding Compression Performance Verification Testing for SDR Content. International Organization for Standardization: Geneva, Switzerland, 2021.
- Choi, K.; Chen, J.; Rusanovskyy, D.; Choi, K.P.; Jang, E.S. An overview of the MPEG-5 essential video coding standard [standards in a nutshell]. IEEE Signal Process. Mag. 2020, 37, 160–167. [Google Scholar] [CrossRef]
- Park, W.-S.; Kim, M. CNN-Based in-Loop Filtering for Coding Efficiency Improvement. In Proceedings of the 2016 IEEE 12th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), Bordeaux, France, 11–12 July 2016; pp. 1–5. [Google Scholar]
- Bjøntegaard, G. Improvement of BD-PSNR Model, ITU-T SG16/Q6 VCEG-AI11. In Proceedings of the 35th VCEG Meeting, Berlin, Germany, 16–18 July 2008. [Google Scholar]
- Dai, Y.; Liu, D.; Wu, F. A Convolutional Neural Network Approach for Post-Processing in HEVC Intra Coding. In Proceedings of the International Conference on Multimedia Model, Reykjavik, Iceland, 4–6 January 2017; pp. 28–39. [Google Scholar]
- Kang, J.; Kim, S.; Lee, K.M. Multi-Modal/Multi-Scale Convolutional Neural Network Based in-Loop Filter Design for next Generation Video Codec. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 26–30. [Google Scholar]
- Wang, M.Z.; Wan, S.; Gong, H.; Ma, M.Y. Attention-based dual-scale CNN in-loop filter for versatile video coding. IEEE Access 2019, 7, 145214–145226. [Google Scholar] [CrossRef]
- Zhang, Y.; Shen, T.; Ji, X.; Zhang, Y.; Xiong, R.; Dai, Q. Residual Highway Convolutional Neural Networks for In-Loop Filtering in HEVC. IEEE Trans. Image Process. 2018, 27, 3827–3841. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Zhang, J.; Li, Z.; Zeng, X.; Zhang, Z.; Zhang, D.; Long, Y.; Wang, N. Neural Network-Based In-Loop Filter for CLIC 2022. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA, 19–20 June 2022; pp. 1773–1776. [Google Scholar]
- Huang, Z.; Sun, J.; Guo, X.; Shang, M. One-for-All: An Efficient Variable Convolution Neural Network for In-Loop Filter of VVC. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 2342–2355. [Google Scholar] [CrossRef]
- Dong, C.; Deng, Y.; Loy, C.C.; Tang, X. Compression Artifacts Reduction by a Deep Convolutional Network. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 576–584. [Google Scholar]
- Li, C.; Song, L.; Xie, R.; Zhang, W. CNN Based Post-Processing to Improve HEVC. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 4577–4580. [Google Scholar]
- Zhang, F.; Feng, C.; Bull, D.R. Enhancing VVC Through Cnn-Based Post-Processing. In Proceedings of the 2020 IEEE International Conference on Multimedia and Expo (ICME), London, UK, 6–10 July 2020; pp. 1–6. [Google Scholar]
- Zhang, F.; Ma, D.; Feng, C.; Bull, D.R. Video Compression with CNN-Based Post Processing. IEEE MultiMedia 2021, 28, 74–83. [Google Scholar] [CrossRef]
- Bonnineau, C.; Hamidouche, W.; Travers, J.F.; Sidaty, N.; Deforges, O. Multitask learning for VVC quality enhancement and super-resolution. In Proceedings of the 2021 Picture Coding Symposium (PCS), Bristol, UK, 29 June–2 July 2021; pp. 1–5. [Google Scholar]
- Wang, M.; Wan, S.; Gong, H.; Yu, Y.; Liu, Y. An Integrated CNN-Based Post Processing Filter for Intra Frame in Versatile Video Coding. In Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Lanzhou, China, 18–21 November 2019; pp. 1573–1577. [Google Scholar]
- Meng, X.; Deng, X.; Zhu, S.; Zeng, B. Enhancing quality for VVC compressed videos by jointly exploiting spatial details and temporal structure. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 1193–1197. [Google Scholar]
- Shan, L.; Elena, A.; Jonathan, P.; Mathias, W.; Ping, W.; Yan, Y. JVET AHG report: Neural-network-based video coding, JVET-T0011. In Proceedings of the 20th JVET Meeting, Teleconference, Online, 7–16 October 2020. [Google Scholar]
- Das, T.; Choi, K.; Choi, J. High Quality Video Frames from VVC: A Deep Neural Network Approach. IEEE Access 2023, 11, 54254–54264. [Google Scholar] [CrossRef]
- Ma, D.; Zhang, F.; Bull, D.R. BVI-DVC: A training database for deep video compression. IEEE Trans. Multimed. 2021, 24, 3847–3858. [Google Scholar] [CrossRef]
- FFmpeg Documentation. Available online: https://ffmpeg.org/ffmpeg.html (accessed on 12 December 2023).
- EVC Encoder (XEVE). Available online: https://github.com/mpeg5/xeve (accessed on 12 December 2023).
- EVC Decoder (XEVD). Available online: https://github.com/mpeg5/xevd (accessed on 12 December 2023).
- Elena, A.; Liao, R.-L.; Liu, S.; Andrew, S. JVET common test conditions and evaluation procedures for neural network-based video coding technology, JVET-AC2016. In Proceedings of the 20th JVET Meeting, Teleconference, Online, 7–16 January 2023. [Google Scholar]
Training dataset | BVI-DVC |
Videos | 800 videos with 10 frames |
Framework | Pytorch 1.13.0 |
Epoch | 50 |
Optimizer | Adam optimizer with a learning rate of |
Models | Five models at QP22, 27, 32, 37, and 42 for AI Five models at QP22, 27, 32, 37, and 42 for LD |
Anchor encoder | XEVE with Baseline profile setting |
Anchor decoder | XEVD with Baseline profile setting |
Hardware | AMD EPYC 7513 32-Core CPUs, 384 GB RAM (AMD, Santa Clara, CA, USA), and an NVIDIA A6000 GPU (NVIDIA, Santa Clara, CA, USA). |
Test dataset | Class A1(4K): Tango2, FoodMarket4, Campfire Class A2(4K): CatRobot, DaylightRoad2, ParkRunning3 Class B(2K): MarketPlace, RitualDance, Cactus, BasketballDrive, BQTerrace Class C(WVGA): BasketballDrill, BQMall, PartyScene, RaceHorses Class D(WQVGA): BasketballPass, BQSquare, BlowingBubbles, RaceHorses |
Frames | Full frames |
Framework | Pytorch |
Models | Five models at QP22, 27, 32, 37, and 42 for AI Five models at QP22, 27, 32, 37, and 42 for LD |
Anchor encoder | XEVE with Baseline profile setting |
Anchor decoder | XEVD with Baseline profile setting |
Hardware | AMD EPYC 7513 32-Core CPUs, 384 GB RAM, and an NVIDIA A6000 GPU. |
Class and Sequence | Bitrate (kpbs) | Reference (dB) | Proposed Method (dB) | BD-PSNR (ΔdB) | BD-BR (Δ%) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Y-PSNR | U-PSNR | V-PSNR | Y-PSNR | U-PSNR | V-PSNR | ΔY-PSNR | ΔU-PSNR | ΔV-PSNR | ΔY-BDBR | ΔU-BDBR | ΔU-BDBR | |||
A1 | Tango2 | 62,688 | 38.91 | 46.67 | 44.71 | 39.21 | 47.74 | 45.81 | 0.30 | 1.07 | 1.10 | −11.42 | −41.06 | −38.15 |
FoodMarket4 | 121,128 | 39.01 | 43.32 | 44.52 | 39.53 | 44.18 | 45.60 | 0.52 | 0.86 | 1.08 | −12.49 | −24.31 | −29.93 | |
Campfire | 76,616 | 37.58 | 39.26 | 40.26 | 37.83 | 40.51 | 41.07 | 0.25 | 1.25 | 0.81 | −6.53 | −33.13 | −33.41 | |
A2 | CatRobot | 122,884 | 37.73 | 40.35 | 40.85 | 38.27 | 41.08 | 41.89 | 0.53 | 0.73 | 1.04 | −14.76 | −36.53 | −36.67 |
DaylightRoad2 | 145,191 | 36.37 | 43.31 | 41.39 | 36.71 | 44.06 | 41.74 | 0.34 | 0.75 | 0.35 | −11.83 | −40.85 | −21.43 | |
ParkRunning3 | 227,250 | 38.12 | 35.40 | 36.45 | 38.61 | 35.61 | 36.66 | 0.49 | 0.21 | 0.21 | −8.34 | −5.66 | −7.67 | |
B | MarketPlace | 42,551 | 37.16 | 41.66 | 42.46 | 37.54 | 42.43 | 43.15 | 0.38 | 0.78 | 0.69 | −9.35 | −29.12 | −28.89 |
RitualDance | 28,415 | 39.31 | 43.95 | 44.30 | 40.10 | 45.05 | 45.76 | 0.79 | 1.10 | 1.46 | −15.27 | −33.28 | −39.10 | |
Cactus | 47,502 | 35.36 | 38.73 | 40.63 | 35.84 | 39.11 | 41.41 | 0.48 | 0.38 | 0.78 | −12.07 | −17.94 | −28.80 | |
BasketballDrive | 31,843 | 36.65 | 42.18 | 42.70 | 37.09 | 42.29 | 43.31 | 0.43 | 0.11 | 0.61 | −11.05 | −5.96 | −22.38 | |
BQTerrace | 80,937 | 34.93 | 40.21 | 42.31 | 35.40 | 40.28 | 42.45 | 0.48 | 0.07 | 0.13 | −7.96 | −5.23 | −9.76 | |
C | BasketballDrill | 11,741 | 35.27 | 39.88 | 39.93 | 36.25 | 40.69 | 41.63 | 0.98 | 0.81 | 1.70 | −18.24 | −25.33 | −40.74 |
BQMall | 12,610 | 35.60 | 40.53 | 41.45 | 36.38 | 41.36 | 42.60 | 0.78 | 0.83 | 1.15 | −14.26 | −26.02 | −32.56 | |
PartyScene | 22,222 | 32.96 | 38.21 | 38.81 | 33.45 | 38.75 | 39.45 | 0.49 | 0.54 | 0.64 | −8.19 | −14.89 | −16.41 | |
RaceHorses | 7724 | 35.33 | 38.58 | 39.98 | 35.90 | 39.61 | 41.30 | 0.57 | 1.03 | 1.32 | −11.12 | −27.99 | −38.47 | |
D | BasketballPass | 2895 | 35.85 | 40.78 | 40.28 | 36.67 | 41.92 | 41.68 | 0.81 | 1.14 | 1.39 | −13.41 | −28.42 | −31.52 |
BQSquare | 7108 | 32.98 | 39.71 | 40.52 | 33.75 | 40.15 | 41.38 | 0.77 | 0.44 | 0.86 | −10.45 | −12.45 | −23.37 | |
BlowingBubbles | 5886 | 32.85 | 37.96 | 38.37 | 33.40 | 38.53 | 39.16 | 0.55 | 0.57 | 0.79 | −9.57 | −16.97 | −21.52 | |
RaceHorses | 2352 | 34.74 | 37.99 | 39.01 | 35.63 | 39.64 | 40.93 | 0.89 | 1.65 | 1.93 | −14.38 | −40.42 | −46.22 | |
Average | 0.57 | 0.75 | 0.95 | −11.62 | −24.50 | −28.79 |
Class and Sequence | Bitrate (kpbs) | Reference (dB) | Proposed Method (dB) | BD-PSNR (ΔdB) | BD-BR (Δ%) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Y-PSNR | U-PSNR | V-PSNR | Y-PSNR | U-PSNR | V-PSNR | ΔY-PSNR | ΔU-PSNR | ΔV-PSNR | ΔY-BDBR | ΔU-BDBR | ΔU-BDBR | |||
A1 | Tango2 | 36.94 | 45.84 | 43.38 | 37.17 | 46.69 | 44.26 | 0.23 | 0.23 | 0.85 | 0.88 | −8.57 | −59.49 | −49.19 |
FoodMarket4 | 35.95 | 41.03 | 41.92 | 36.19 | 42.28 | 43.40 | 0.25 | 0.25 | 1.26 | 1.48 | −7.01 | −63.93 | −68.60 | |
Campfire | 35.49 | 37.26 | 39.01 | 35.74 | 38.11 | 39.68 | 0.25 | 0.25 | 0.85 | 0.67 | −7.28 | −27.81 | −34.49 | |
A2 | CatRobot | 35.28 | 39.50 | 39.56 | 35.61 | 40.18 | 40.48 | 0.33 | 0.33 | 0.69 | 0.93 | −10.55 | −56.66 | −49.27 |
DaylightRoad2 | 33.97 | 41.81 | 39.97 | 34.16 | 42.75 | 40.67 | 0.19 | 0.19 | 0.94 | 0.70 | −8.88 | −71.13 | −58.59 | |
ParkRunning3 | 34.05 | 33.09 | 34.51 | 34.30 | 33.33 | 34.82 | 0.26 | 0.26 | 0.24 | 0.31 | −5.86 | −12.42 | −17.60 | |
B | MarketPlace | 33.98 | 40.00 | 40.89 | 34.18 | 40.91 | 41.63 | 0.20 | 0.20 | 0.91 | 0.74 | −6.59 | −63.16 | −56.89 |
RitualDance | 35.71 | 42.31 | 42.43 | 36.14 | 43.36 | 43.72 | 0.43 | 0.43 | 1.05 | 1.29 | −9.36 | −51.85 | −54.47 | |
Cactus | 32.71 | 37.87 | 39.64 | 33.05 | 38.38 | 40.37 | 0.34 | 0.34 | 0.50 | 0.73 | −11.87 | −48.07 | −44.86 | |
BasketballDrive | 33.77 | 40.74 | 40.77 | 34.13 | 41.39 | 41.79 | 0.36 | 0.36 | 0.65 | 1.02 | −11.47 | −45.44 | −48.28 | |
BQTerrace | 31.19 | 37.83 | 39.75 | 31.52 | 38.72 | 40.86 | 0.32 | 0.32 | 0.89 | 1.11 | −13.79 | −65.73 | −70.76 | |
C | BasketballDrill | 31.83 | 37.89 | 37.72 | 32.43 | 39.11 | 39.20 | 0.60 | 0.60 | 1.22 | 1.48 | −14.77 | −48.76 | −49.42 |
BQMall | 31.73 | 38.74 | 39.58 | 32.23 | 39.87 | 40.91 | 0.50 | 0.50 | 1.13 | 1.32 | −12.67 | −57.01 | −58.53 | |
PartyScene | 28.39 | 36.39 | 37.13 | 28.74 | 36.97 | 37.67 | 0.35 | 0.35 | 0.58 | 0.55 | −10.83 | −29.20 | −26.30 | |
RaceHorses | 31.52 | 36.94 | 38.55 | 31.91 | 37.66 | 39.53 | 0.39 | 0.39 | 0.72 | 0.98 | −10.72 | −41.02 | −54.39 | |
D | BasketballPass | 31.88 | 39.26 | 38.18 | 32.44 | 39.91 | 39.10 | 0.57 | 0.57 | 0.65 | 0.92 | −12.35 | −27.59 | −32.37 |
BQSquare | 27.79 | 38.10 | 38.60 | 28.37 | 38.94 | 39.74 | 0.58 | 0.58 | 0.84 | 1.14 | −18.51 | −60.08 | −68.07 | |
BlowingBubbles | 28.37 | 35.93 | 36.49 | 28.68 | 36.54 | 37.01 | 0.31 | 0.31 | 0.60 | 0.52 | −9.49 | −31.72 | −26.01 | |
RaceHorses | 30.77 | 36.34 | 37.35 | 31.31 | 37.36 | 38.65 | 0.54 | 0.54 | 1.03 | 1.30 | −12.68 | −44.79 | −52.78 | |
Average | 0.37 | 0.82 | 0.95 | −10.70 | −47.68 | −48.47 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Choi, K. Block Partitioning Information-Based CNN Post-Filtering for EVC Baseline Profile. Sensors 2024, 24, 1336. https://doi.org/10.3390/s24041336
Choi K. Block Partitioning Information-Based CNN Post-Filtering for EVC Baseline Profile. Sensors. 2024; 24(4):1336. https://doi.org/10.3390/s24041336
Chicago/Turabian StyleChoi, Kiho. 2024. "Block Partitioning Information-Based CNN Post-Filtering for EVC Baseline Profile" Sensors 24, no. 4: 1336. https://doi.org/10.3390/s24041336
APA StyleChoi, K. (2024). Block Partitioning Information-Based CNN Post-Filtering for EVC Baseline Profile. Sensors, 24(4), 1336. https://doi.org/10.3390/s24041336