A Trusted Supervision Paradigm for Autonomous Driving Based on Multimodal Data Authentication
Abstract
:1. Introduction
- A trusted supervision paradigm for autonomous driving (TSPAD) based on multimodal data authentication is proposed for the timely and intelligent quantification and recording of the safety steward’s abnormal behavior in autonomous driving. TSPAD is an innovative framework designed to address the demands of real-world business scenarios, and its feasibility is thoroughly validated in this paper.
- A encrypted evidence storage method based on dual-chain blockchain architecture is designed to enhance the sharing efficiency and trustworthiness among multiple stakeholders. This method integrates a private key chain for secure identity authentication and a notary chain for reliable data verification, ensuring both data integrity and privacy protection in distributed environments.
- The abnormal behavior of safety drivers in autonomous driving is quantified and key frames are extracted through deep learning models. This paper designs a variety of optional key frames selection strategies to cope with different application scenarios. For scenarios with high speed requirements, the intermediate frames or random frames selection strategy is adopted. For scenarios with high accuracy requirements, the adaptive clustering key frames selection strategy can be utilized.
- An image compression algorithm is employed to alleviate the burden on distributed storage. In the context of autonomous driving, leveraging the computational power of hardware, an image compression algorithm accelerated by graphics processing units (GPUs) is utilized to reduce the burden on distributed storage, while preserving maximum image details.
2. Related Work
2.1. Blockchain for Traffic Incident and Accident Management
2.2. Abnormal Behavior Detection
2.3. Video Coding Standards
3. Methods
3.1. Design of Blockchain for Trusted Evidence Preservation
3.1.1. Dual-Chain Blockchain Architecture
3.1.2. Blockchain-Based Evidence Notarization Process
- Transaction initiation. After reaching a transaction agreement off-chain, the enterprise generates an asymmetric private key and initiates a new transaction with the logistics platform, transmitting the public key for data encryption to the logistics platform.
- Transaction public–private key distribution. Upon receiving this transaction request, the logistics platform generates a corresponding set of transaction asymmetric keys and a business transaction ID. The transaction content and the private key of the asymmetric keys are encrypted using the public key provided by the enterprise. The encrypted data are serialized and packaged into a blockchain transaction, which is then uploaded to the private key chain with the business transaction ID as the blockchain transaction index. The generated public key is distributed to the vehicle executing the task to encrypt the data recorded during transportation. After is uploaded to the chain, the logistics platform returns the business transaction ID to the enterprise, which retrieves the corresponding block on the private key chain using the business transaction ID and decrypts the with its own private key .
- Transportation information on-chain. The vehicle computes a hash for the recorded vehicle driving data or the compressed video of driver’s abnormal behavior. This hash value can be used to verify whether the data have been tampered with during subsequent evidence collection. Using a hybrid encryption method, the vehicle generates a symmetric key to encrypt the data and encrypts the symmetric key with the transaction public key . The encrypted data and key are then chained onto the notary chain with the business transaction ID as the blockchain transaction index. The purpose of this approach is to leverage the security provided by asymmetric encryption algorithms to safely transmit the symmetric key and then use the efficiency of symmetric encryption algorithms to encrypt large amounts of data.
- Retrieval of transportation information. The enterprise creates a subscriber channel within the notary chain through the blockchain’s message subscription mechanism. When a new transaction is generated, the blockchain sends a notification through the subscriber channel. Enterprises can retrieve blocks associated with their transactions using the business transaction ID and decrypt the symmetric key with the transaction private key . Through this approach, enterprises can access real-time information on vehicle operations and cargo transportation.
- Regulatory intervention. The regulatory authority possesses a private asymmetric key pair. When the regulatory authority needs to access the transportation information of a particular transaction, it initiates a request to the logistics platform, transmitting its public key and the business transaction ID to be queried. Upon verifying the digital identity of the regulatory authority, the logistics platform uses the transmitted public key to encrypt the transaction private key . Subsequently, the logistics platform generates a query ID to serve as the index for the blockchain transaction and uploads the encrypted transaction private key to the private key chain. Finally, the logistics platform returns the query ID to the regulatory authority, enabling it to use this ID and its private key to retrieve the transaction private key from the private key chain, and thereby access the desired transportation information from the notary chain.
3.2. Abnormal Driving Behavior Detection
3.2.1. Detection Framework Based on Key Frames Adaptive Cluster
3.2.2. Adaptive Cluster Algorithm
Algorithm 1: Adaptive Cluster Algorithm. |
3.3. Intra–Intercoding-Based Video Compression
3.3.1. Discrete Cosine Transform
- Two-dimensional DCT. Let there be an image of size , where represents the value of the pixel at row i and column j. The image is transformed from the pixel matrix to the frequency domain matrix after DCT, where u and v represent the indexes of the horizontal and vertical frequency components, respectively. The definition of is as follows:
- Two-dimensional inverse discrete cosine transform (IDCT). The inverse transform is given by:
- Matrix formulation of DCT. The DCT formula is characterized by cyclic summation, which does not effectively harness the efficient matrix computation capabilities of GPUs. Therefore, the summation formula is often converted into matrix form to enhance computational efficiency. The specific formula is as follows:
3.3.2. Motion Compensation
- Compare pixel differences between adjacent frames to estimate the direction and size of macroblock motion;
- Use the results of motion estimation for inter-frame prediction;
- Generate the residual output by computing the difference between actual values and predicted values.
3.3.3. AV1 Hybrid Coding
- Segmentation of the single frame image into macroblocks of different sizes;
- Flexibly selecting from 56 prediction modes based on image texture to obtain intraframe prediction signals;
- Calculating the residual signal by subtracting the predicted macroblocks from the original macroblocks;
- Applying cosine Fourier transform and quantization to the residual signal to remove high-frequency information;
- Completing lossy compression by inverse processing the processed frequency domain residual signal;
- Adding the residual signal to the intraframe prediction signal and using loop filtering to mitigate block artifacts;
- Outputting the result through entropy coding.
4. Experiment Result
4.1. Dataset
4.2. Driving Abnormal Behavior Detection
4.3. Data Compression
4.4. Blockchain Performance Analysis
4.5. Limitations and Weaknesses
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Xiao, L.; Luo, C.; Yu, T.; Luo, Y.; Wang, M.; Yu, F.; Li, Y.; Tian, C.; Qiao, J. DeepACEv2: Automated Chromosome Enumeration in Metaphase Cell Images using Deep Convolutional Neural Networks. IEEE Trans. Med. Imaging 2020, 39, 3920–3932. [Google Scholar] [CrossRef]
- Meng, Z.; Fan, Z.; Zhao, Z.; Su, F. ENS-Unet: End-to-End Noise Suppression U-Net for Brain Tumor Segmentation. In Proceedings of the 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Honolulu, HI, USA, 18–21 July 2018; pp. 5886–5889. [Google Scholar]
- Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. Nuscenes: A Multimodal Dataset for Autonomous Driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 11621–11631. [Google Scholar]
- Han, C.; Zhao, Q.; Zhang, S.; Chen, Y.; Zhang, Z.; Yuan, J. Yolopv2: Better, Faster, Stronger for Panoptic Driving Perception. arXiv 2022, arXiv:2208.11434. [Google Scholar]
- Campbell, M.; Egerstedt, M.; How, J.P.; Murray, R.M. Autonomous driving in Urban Environments: Approaches, Lessons and Challenges. Philos. Trans. R. Soc. A 2010, 368, 4649–4672. [Google Scholar] [CrossRef]
- Muhammad, K.; Ullah, A.; Lloret, J.; Del Ser, J.; de Albuquerque, V.H.C. Deep Learning for Safe Autonomous Driving: Current Challenges and Future Directions. IEEE Trans. Intell. Transp. 2020, 22, 4316–4336. [Google Scholar] [CrossRef]
- Ionescu, R.T.; Smeureanu, S.; Popescu, M.; Alexe, B. Detecting Abnormal Events in Video Using Narrowed Normality clusters. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA, 7–11 January 2019; pp. 1951–1960. [Google Scholar]
- Deepak, K.; Chandrakala, S.; Mohan, C.K. Residual Spatiotemporal Autoencoder for Unsupervised Video Anomaly Detection. Signal Image Video Process. 2021, 15, 215–222. [Google Scholar] [CrossRef]
- Cho, M.; Kim, T.; Kim, W.J.; Cho, S.; Lee, S. Unsupervised Video Anomaly Detection Via Normalizing Flows with Implicit Latent Features. Pattern Recogn. 2022, 129, 108703. [Google Scholar] [CrossRef]
- Cui, X.; Liu, Q.; Gao, M.; Metaxas, D.N. Abnormal Detection Using Interaction Energy Potentials. In Proceedings of the CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011; pp. 3161–3167. [Google Scholar]
- Vu, H. Deep Abnormality Detection in Video Data. In Proceedings of the IJCAI, Melbourne, Australia, 19–25 August 2017; pp. 5217–5218. [Google Scholar]
- Tang, H.; Ding, L.; Wu, S.; Ren, B.; Sebe, N.; Rota, P. Deep Unsupervised Key Frame Extraction for Efficient Video Classification. ACM Trans. Multim. Comput. 2023, 19, 1–17. [Google Scholar] [CrossRef]
- Tan, K.; Zhou, Y.; Xia, Q.; Liu, R.; Chen, Y. Large Model Based Sequential Keyframe Extraction for Video Summarization. arXiv 2024, arXiv:2401.04962. [Google Scholar]
- Tian, Z.; Li, M.; Qiu, M.; Sun, Y.; Su, S. Block-DEF: A Secure Digital Evidence Framework Using Blockchain. ISCI 2019, 491, 151–165. [Google Scholar] [CrossRef]
- Agrawal, T.K.; Kumar, V.; Pal, R.; Wang, L.; Chen, Y. Blockchain-Based Framework for Supply Chain Traceability: A Case Example of Textile and Clothing Industry. Comput. Ind. Eng. 2021, 154, 107130. [Google Scholar] [CrossRef]
- Cebe, M.; Erdin, E.; Akkaya, K.; Aksu, H.; Uluagac, S. Block4forensic: An Integrated Lightweight Blockchain Framework for Forensics Applications of Connected Vehicles. IEEE Commun. Mag. 2018, 56, 50–57. [Google Scholar] [CrossRef]
- Yao, Q.; Li, T.; Yan, C.; Deng, Z. Accident Responsibility Identification Model for Internet of Vehicles Based on Lightweight Blockchain. Comput. Intell. 2023, 39, 58–81. [Google Scholar] [CrossRef]
- Philip, A.O.; Saravanaguru, R.K. Secure Incident & Evidence Management Framework (SIEMF) for Internet of Vehicles Using Deep Learning and Blockchain. Open Comput. Sci. 2020, 10, 408–421. [Google Scholar]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based Learning Applied to Document Recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Bhowmik, D.; Feng, T. The Multimedia Blockchain: A Distributed and Tamper-Proof Media Transaction Framework. In Proceedings of the 2017 22nd International Conference on Digital Signal Processing (DSP), London, UK, 23–25 August 2017; pp. 1–5. [Google Scholar]
- Du, W.; Liu, H.; Luo, G.; Zhang, J.; Xu, W. A Consortium Blockchain-Enabled Evidence Sharing System for Public Interest Litigation. J. Glob. Inf. Manag. (JGIM) 2023, 31, 1–19. [Google Scholar] [CrossRef]
- Philip, A.O.; Saravanaguru, R.K. Smart Contract Based Digital Evidence Management Framework over Blockchain for Vehicle Accident Investigation in IoV era. J. King Saud-Univ.-Comput. Inf. Sci. 2022, 34, 4031–4046. [Google Scholar] [CrossRef]
- Philip, A.O.; Saravanaguru, R.K. Multisource Traffic Incident Reporting and Evidence Management in Internet of Vehicles using Machine Learning and Blockchain. Eng. Appl. Artif. Intel. 2023, 117, 105630. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Souček, T.; Lokoč, J. Transnet v2: An Effective Deep Network Architecture for Fast Shot Transition Detection. arXiv 2020, arXiv:2008.04838. [Google Scholar]
- Turletti, T.H. 261 Software Codec for Videoconferencing over the Internet. Ph.D. Thesis, INRIA, Paris, France, 1993. [Google Scholar]
- Aramvith, S.; Sun, M.T. MPEG-1 and MPEG-2 Video Standards. In Handbook of Image and Video Processing; 2000; pp. 597–610. Available online: https://preetikale.wordpress.com/wp-content/uploads/2018/07/handbook-of-image-and-video-processing-al-bovik1.pdf (accessed on 20 June 2024).
- Akiyama, T.; Aono, H.; Aoki, K.; Ler, K.; Wilson, B.; Araki, T.; Morishige, T.; Takeno, H.; Sato, A.; Nakatani, S.; et al. MPEG2 Video Codec using Image Compression DSP. IEEE Trans. Consum. Electron. 1994, 40, 466–472. [Google Scholar] [CrossRef]
- Schwarz, H.; Marpe, D.; Wiegand, T. Overview of The Scalable H. 264/MPEG4-AVC Extension. In Proceedings of the 2006 International Conference on Image Processing, Atlanta, GA, USA, 8–11 October 2006; pp. 161–164. [Google Scholar]
- Pastuszak, G.; Abramowski, A. Algorithm and Architecture Design of the H. 265/HEVC Intra Encoder. IEEE Trans. Circuits Syst. Video Technol. 2015, 26, 210–222. [Google Scholar] [CrossRef]
- Islam, A.; Morol, M.K.; Shin, S.Y. A Federated Learning-Based Blockchain-Assisted Anomaly Detection Scheme to Prevent Road Accidents in Internet of Vehicles. In Proceedings of the 2nd International Conference on Computing Advancements, Dhaka, Bangladesh, 10–12 March 2022; pp. 516–521. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models from Natural Language supervision. In Proceedings of the International Conference on Machine Learning. PMLR, Virtual Event, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
- Martin, M.; Roitberg, A.; Haurilet, M.; Horne, M.; Reiß, S.; Voit, M.; Stiefelhagen, R. Drive&act: A Multi-Modal Dataset for Fine-Grained Driver Behavior Recognition in Autonomous Vehicles. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2801–2810. [Google Scholar]
- Fox, A.; Brewer, E.A. Harvest, Yield, and Scalable Tolerant Systems. In Proceedings of the Seventh Workshop on Hot Topics in Operating Systems, Rio Rico, AZ, USA, 28–30 March 1999; pp. 174–178. [Google Scholar]
- Witzel, C.; Gegenfurtner, K.R. Color perception: Objects, Constancy, and Categories. Annu. Rev. Vis. Sci. 2018, 4, 475–499. [Google Scholar] [CrossRef] [PubMed]
- Khayam, S.A. The Discrete Cosine Transform (DCT): Theory and Application; Michigan State University: East Lansing, MI, USA, 2003; Volume 114, p. 31. [Google Scholar]
- Karczewicz, M.; Niewegłowski, J.; Haavisto, P. Video Coding Using Motion Compensation with Polynomial Motion Vector Fields. Signal Process-Image 1997, 10, 63–91. [Google Scholar]
- Chen, Y.; Murherjee, D.; Han, J.; Grange, A.; Xu, Y.; Liu, Z.; Parker, S.; Chen, C.; Su, H.; Joshi, U.; et al. An Overview of Core Coding Tools in the AV1 Video Codec. In Proceedings of the 2018 Picture Coding Symposium (PCS), San Francisco, CA, USA, 24–27 June 2018; pp. 41–45. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for Mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- Marpe, D.; Wiegand, T.; Sullivan, G.J. The H. 264/MPEG4 Advanced Video Coding Standard and Its Applications. IEEE Commun. Mag. 2006, 44, 134–143. [Google Scholar] [CrossRef]
- Fu, T.; Zhang, H.; Mu, F.; Chen, H. Fast CU partitioning algorithm for H. 266/VVC intra-frame coding. In Proceedings of the 2019 IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, 8–12 July 2019; pp. 55–60. [Google Scholar]
- Hore, A.; Ziou, D. Image Quality Metrics: PSNR vs. SSIM. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369. [Google Scholar]
- Chainmaker. Available online: https://chainmaker.org.cn/home (accessed on 28 June 2024).
Backbone | Parameters | Top3 | ||||||
---|---|---|---|---|---|---|---|---|
CLIP(Frozen) [36] | - | 0.458 | 0.447 | 0.401 | 0.652 | 0.589 | 0.550 | 0.860 |
MBNetV3-S [43] | 5 M | 0.525 | 0.489 | 0.472 | 0.666 | 0.535 | 0.562 | 0.839 |
MBNetV3-L [43] | 14 M | 0.609 | 0.648 | 0.570 | 0.732 | 0.641 | 0.642 | 0.810 |
ResNet-18 [1] | 43 M | 0.678 | 0.722 | 0.673 | 0.758 | 0.736 | 0.727 | 0.911 |
ResNet-34 [1] | 82 M | 0.640 | 0.687 | 0.630 | 0.789 | 0.755 | 0.756 | 0.895 |
ResNet-50 [1] | 100 M | 0.743 | 0.795 | 0.754 | 0.827 | 0.803 | 0.805 | 0.926 |
Method | |||||||
---|---|---|---|---|---|---|---|
Random Frame | 0.693 | 0.682 | 0.678 | 0.778 | 0.792 | 0.780 | 0.937 |
Center Frame | 0.740 | 0.776 | 0.718 | 0.834 | 0.797 | 0.795 | 0.925 |
Cluster Frame | 0.743 | 0.795 | 0.754 | 0.827 | 0.803 | 0.805 | 0.926 |
Indicators | WV | ED | Work | RWMN | OOJ | FSB | OOS |
---|---|---|---|---|---|---|---|
Precision | 0.784 | 0.668 | 0.982 | 0.950 | 0.661 | 0.812 | 0.324 |
Recall | 0.909 | 0.593 | 0.668 | 0.915 | 0.933 | 0.928 | 0.571 |
F1 | 0.842 | 0.628 | 0.795 | 0.932 | 0.774 | 0.866 | 0.413 |
Coder | Size (MB) | Compress Rate | Bit Rate (KPS) | Coding Time (s) | Code Rate (MB/s) | PSNR (dB) | SSIM |
---|---|---|---|---|---|---|---|
ORIGION | 762 | 1.00 | 278,171 | / | / | / | / |
MPEG2 [32] | 89.8 | 8.49 | 32,750 | 70.69 | 10.78 | 33.601 | 0.9573 |
MPEG4 [33] | 78 | 9.77 | 28,470 | 71.34 | 10.68 | 32.9286 | 0.9441 |
H.264_MF [44] | 35 | 21.77 | 12,789 | 73.98 | 10.30 | 26.1753 | 0.8605 |
LIBX265 [34] | 6.99 | 109.01 | 2415 | 968.19 | 0.79 | 45.3921 | 0.9812 |
AV1_NV | 4.22 | 180.57 | 1533 | 82.90 | 9.19 | 43.3955 | 0.9769 |
LIBSVTAV1 | 7.27 | 104.81 | 2645 | 121.79 | 6.26 | 45.9398 | 0.9825 |
H.266 [45] | 67.19 | 11.34 | 100,965 | 1933 | 0.63 | 45.2085 | 0.9801 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shi, T.; Wu, R.; Zhou, C.; Zheng, S.; Meng, Z.; Cui, Z.; Huang, J.; Ren, C.; Zhao, Z. A Trusted Supervision Paradigm for Autonomous Driving Based on Multimodal Data Authentication. Big Data Cogn. Comput. 2024, 8, 100. https://doi.org/10.3390/bdcc8090100
Shi T, Wu R, Zhou C, Zheng S, Meng Z, Cui Z, Huang J, Ren C, Zhao Z. A Trusted Supervision Paradigm for Autonomous Driving Based on Multimodal Data Authentication. Big Data and Cognitive Computing. 2024; 8(9):100. https://doi.org/10.3390/bdcc8090100
Chicago/Turabian StyleShi, Tianyi, Ruixiao Wu, Chuantian Zhou, Siyang Zheng, Zhu Meng, Zhe Cui, Jin Huang, Changrui Ren, and Zhicheng Zhao. 2024. "A Trusted Supervision Paradigm for Autonomous Driving Based on Multimodal Data Authentication" Big Data and Cognitive Computing 8, no. 9: 100. https://doi.org/10.3390/bdcc8090100
APA StyleShi, T., Wu, R., Zhou, C., Zheng, S., Meng, Z., Cui, Z., Huang, J., Ren, C., & Zhao, Z. (2024). A Trusted Supervision Paradigm for Autonomous Driving Based on Multimodal Data Authentication. Big Data and Cognitive Computing, 8(9), 100. https://doi.org/10.3390/bdcc8090100