Enhancing the Minimum Awareness Failure Distance in V2X Communications: A Deep Reinforcement Learning Approach
Abstract
:1. Introduction
- We present a new definition of awareness in V2X communications, framing it as a multi-faceted concept involving vehicle detection, tracking, and safe distances.
- We propose the DRL−JCBRTP framework, which integrates LSTM-based actor networks and MLP-based critic networks within the SAC framework, enabling the effective learning of optimal policies.a model-free DRL approach to improve cooperative vehicle awareness in V2X communications.
- We design an innovative reward function that leverages local state information to meticulously address multiple aspects of vehicle awareness.
- Based onWe develop our SLMLab-Gym-VEINS simulation framework, we to demonstrate the superior performance of the DRL−JCBRTP scheme in terms of the probability and distance of awareness failure, which ultimately improves driving safety.
2. Related Works
3. Our Definition of Awareness in V2X Communications
- Vehicle detection refers to the ability to recognize the presence of a neighbor vehicle. If the ego vehicle receives at least one beacon from a neighbor vehicle in , the neighbor vehicle is considered successfully detected; otherwise, its detection is deemed to have failed.
- Vehicle tracking involves continuously monitoring and updating the kinematic states of a neighbor vehicle. If the tracking error is within threshold , the neighbor vehicle is considered successfully tracked; otherwise, its tracking is considered to have failed.
- Minimum awareness failure distance (mAFD) is the distance between the ego vehicle and the nearest neighbor vehicle that has failed to be detected or tracked. Increasing the mAFD is crucial for maintaining road safety and preventing accidents.
- Minimum detection failure distance (mDFD) is the distance to the nearest undetected vehicle. This measure helps assess the critical proximity at which the ego vehicle is unaware of nearby vehicles, posing the most significant safety risk.
- Minimum tracking failure distance (mTFD) is the minimum distance to the nearest untracked vehicle. This measure evaluates the risk associated with vehicles that, although detected, have their kinematic information inaccurately monitored, potentially leading to unsafe driving maneuvers.
- Maximum tracking error (MTE) is the largest tracking error over all tracked neighbor vehicles. While it does not pose a risk of accidents, this measure indicates the upper bound of the tracking inaccuracy of tracked neighbor vehicles.
4. Our DRL−JCBRTP Framework
4.1. System Model
4.2. DRL−JCBRTP NN Architecture Design Based on SAC Algorithm
4.3. MDP Specifications of Our DRL-JCBRTP
- 1.
- Agent, tasked with learning the optimal policy and executing actions, is represented by individual vehicles in the road environment. These vehicles observe their local state information to adjust their BR and TP accordingly, aiming to fulfill diverse objectives. This setup fosters a decentralized framework, where every vehicle independently makes a decision informed by its specific state information. Despite the decentralized nature of decision-making, we assume that the same underlying policy is used across all vehicles, ensuring a coherent and unified strategy for managing BR and TP on the road.
- 2.
- Environment serves as the complex and dynamic setting in which the agent operates, providing critical feedback on the agent’s actions and influencing its subsequent state. In addition, it plays a pivotal role in delivering essential state information and issuing rewards or penalties based on the agent’s decisions. This feedback loop is essential for steering the agent toward optimal behaviors, enabling it to adeptly adjust to and manage the continuously changing road and traffic conditions.
- 3.
- Actions, allowing the agent to react to the environment, are designated as , where is the set of possible actions. At each frame, the agent evaluates its current state to decide on a beacon broadcast, thus affecting the BR and TP. These actions are structured as a pair consisting of the decision to transmit a beacon (b) and the TP (p), where indicates whether to broadcast a beacon frame (1) or not (0), and if , an appropriate TP p dBm is chosen to mitigate channel congestion. Notably, all these actions conform to access-layer standards, ensuring regulatory compliance [22,24,26,27].
- 4.
- The state is a vector consisting of the internal parameters of the agent and its observation of the environment, where S represents the complete set of potential states. This state, forming the critical basis for action selection by the agent, is exclusively built from locally available information, which includes:
- Vehicle kinematics: the speed (VS) and acceleration (VA) of the ego vehicle obtained from the vehicle kinematics estimator.
- CBR: the ratio of channel occupancy at the previous frame obtained from the access radio.
- Number of neighbor vehicles (NNV): number of distinct vehicles whose beacons are successfully received in the last 20 frames.
- Average Beacon Interval (ABI): average beacon interval between successively received beacons over all neighbor vehicles, quantified in the unit of frame duration T.
Then, the state of the agent is represented by a 5-tuple vector: - 5.
- Rewards play a pivotal role in guiding the learning process of the agent. Upon executing an action a at time t, the agent transits from its current state s to a new state , during which it receives a reward . This reward, a critical form of feedback from the environment, is formulated as a function of both the state s and the action a. The specific structure and implications of this reward mechanism, which fundamentally influences the agent’s behavior, will be elaborated upon in the following section.
4.4. Design Principle of Reward Functions
- mDFD cost : The mDFD cost is the penalty applied to the distance d to the nearest undetected vehicle. Since the ego vehicle is unaware of this undetected vehicle, a short mDFD no greater than the detection distance threshold poses the most significant safety risk. To reflect this severity, the highest penalty value should be set to its weight in (4). Since the risk of V2V collision quickly decreases with the increase in distance between the colliding vehicles, we exponentially decrease the if the distance exceeds the threshold , as follows:
- mTFD cost : The mTFD cost is the penalty imposed on the distance d to the nearest untracked vehicle. Although this vehicle has already been detected, a short mTFD less than the tracking distance threshold potentially yields a V2V collision. Similar to the mDFD, if the distance is no greater than , a relatively high constant weight is allocated to the mTFD cost; otherwise, it decreases exponentially with the relative distance, as follows:
- MTE cost : The MTE cost refers to the maximum TE of a tracked neighbor vehicle , where it is always less than tracking threshold , as follows:Although it does not incur a risk of collision, the MTE cost can be interpreted as the upper bound of the tracking accuracy of a neighbor vehicle.
- Beacon interval cost : The beacon interval cost informs the agent about the elapsed time since its last beacon broadcast, which is crucial for ensuring the freshness of kinematic information updates. It is proportional to the ABI to quantify the beacon aging:
- Congestion cost : The congestion cost encourages the agent to make judicious decisions regarding beacon transmission in relation to both current CBR and TP level, as follows:
5. Numerical Results
5.1. Simulation Framework, Training, Testing, and Deployments
5.2. Weight and Parameter Setting of Our Reward Components
5.3. Impacts of DRL-JCBRTP NN Architectures on Beaconing Performance
5.4. Performance Comparison with Other Beaconing Schemes
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Abdel-Rahim, A.; Bai, F.; Bai, S.; Basnayake, C.; Bellur, B.; Brown, G.; Brovold, S.; Caminiti, L.; Cunningham, D.; Elzein, H.; et al. Vehicle Safety Communications—Applications (VSC-A) Final Report; Tech. Rep. DOT HS 811492; U.S. Department of Transportation: Washington, DC, USA, 2011. [Google Scholar]
- National Highway Traffic Safety Administration and Department of Transportation. Federal Motor Vehicle Safety Standards; Tech. Rep.; V2V Communications: Washington, DC, USA, 2017. [Google Scholar]
- ETSI EN 302 637-2 V1.4.1; Intelligent Transport Systems (ITS); Vehicular Communications; Basic Set of Applications; Part 2: Specification of Cooperative Awareness Basic Service. ETSI: Sophia Antipolis, France, 2019.
- SAE J2735; Dedicated Short Range Communications (DSRC) Message Set Dictionary. SAE International: Warrendale PA, USA, 2023.
- ETSI EN 302 637-3 V1.3.1; Intelligent Transport Systems (ITS); Vehicular Communications; Basic Set of Applications; Part 3: Specification of Decentralized Environmental Notification Basic Service. ETSI: Sophia Antipolis, France, 2019.
- Torrent-Moreno, M.; Mittag, J.; Santi, P.; Hartenstein, H. Vehicle-to-vehicle communication: Fair transmit power control for safety-critical information. IEEE Trans. Veh. Technol. 2009, 58, 3684–3703. [Google Scholar] [CrossRef]
- Bansal, G.; Kenney, J.B.; Rohrs, C.E. LIMERIC: A linear adaptive message rate algorithm for DSRC congestion control. IEEE Trans. Veh. Technol. 2013, 62, 4182–4197. [Google Scholar] [CrossRef]
- Sommer, C.; Joerer, S.; Segata, M.; Tonguz, O.K.; Cigno, R.L.; Dressler, F. How shadowing hurts vehicular communications and how dynamic beaconing can help. IEEE Trans. Mob. Comput. 2015, 14, 1411–1421. [Google Scholar] [CrossRef]
- Dayal, A.; Colbert, E.; Marojevic, V.; Reed, J. Risk Controlled Beacon Transmission in V2V Communications. In Proceedings of the 2019 IEEE 89th Vehicular Technology Conference (VTC2019-Spring), Kuala Lumpur, Malaysia, 28 April–1 May 2019; pp. 1–6. [Google Scholar] [CrossRef]
- Hajiaghajani, F.; Qiao, C. Tailgating Risk-Aware Beacon Rate Adaptation for Distributed Congestion Control in VANETs. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM), Big Island, HI, USA, 9–13 December 2019; pp. 1–6. [Google Scholar] [CrossRef]
- ETSI. Intelligent Transport Systems (ITS)—Decentralized Congestion Control Mechanisms for Intelligent Transport Systems Operating in the 5 GHz Range, Access Layer Part; Technical Report ETSI TS 102 687 V1.2.1; European Telecommunications Standards Institute (ETSI): Nice, France, 2018. [Google Scholar]
- Goudarzi, F.; Asgari, H.; Al-Raweshidy, H.S. Fair and stable joint beacon frequency and power control for connected vehicles. Wirel. Netw. 2019, 25, 4979–4990. [Google Scholar] [CrossRef]
- Aznar-Poveda, J.; Garcia-Sanchez, A.J.; Egea-Lopez, E.; Garcia-Haro, J. MDPRP: A Q-Learning Approach for the Joint Control of Beaconing Rate and Transmission Power in VANETs. IEEE Access 2021, 9, 10166–10178. [Google Scholar] [CrossRef]
- Egea-Lopez, E.; Pavon-Mariño, P.; Santa, J. Optimal Joint Power and Rate Adaptation for Awareness and Congestion Control in Vehicular Networks. IEEE Trans. Intell. Transp. Syst. 2022, 23, 25033–25046. [Google Scholar] [CrossRef]
- Huang, C.L.; Fallah, Y.P.; Sengupta, R.; Krishnan, H. Intervehicle Transmission Rate Control for Cooperative Active Safety System. IEEE Trans. Intell. Transp. Syst. 2011, 12, 645–658. [Google Scholar] [CrossRef]
- Bansal, G.; Lu, H.; Kenney, J.B.; Poellabauer, C. EMBARC: Error model based adaptive rate control for vehicle-to-vehicle communications. In Proceedings of the 10th ACM International Workshop on Vehicular Inter-networking, Systems, and Applications (VANET), Taipei, Taiwan, 25 June 2013; pp. 41–50. [Google Scholar] [CrossRef]
- Nguyen, H.H.; Jeong, H.Y. Mobility-Adaptive Beacon Broadcast for Vehicular Cooperative Safety-Critical Applications. IEEE Trans. Intell. Transp. Syst. 2018, 19, 1996–2010. [Google Scholar] [CrossRef]
- Aznar-Poveda, J.; Garcia-Sanchez, A.J.; Egea-Lopez, E.; García-Haro, J. Simultaneous Data Rate and Transmission Power Adaptation in V2V Communications: A Deep Reinforcement Learning Approach. IEEE Access 2021, 9, 122067–122081. [Google Scholar] [CrossRef]
- Aznar-Poveda, J.; García-Sánchez, A.J.; Egea-López, E.; García-Haro, J. Approximate reinforcement learning to control beaconing congestion in distributed networks. Sci. Rep. 2022, 12, 142. [Google Scholar] [CrossRef] [PubMed]
- Liu, X.; Amour, B.S.; Jaekel, A. A Reinforcement Learning-Based Congestion Control Approach for V2V Communication in VANET. Appl. Sci. 2023, 13, 3640. [Google Scholar] [CrossRef]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. arXiv 2018, arXiv:1801.01290. [Google Scholar]
- Intelligent Transport Systems (ITS). ITS-G5 Access Layer Specification for Intelligent Transport Systems Operating in the 5 GHz Frequency Band; Intelligent Transport Systems: Chichester, UK, 2019. [Google Scholar]
- IEEE Std 1609.0-2019 (Revision of IEEE Std 1609.0-2013); IEEE Guide for Wireless Access in Vehicular Environments (WAVE) Architecture. IEEE: Piscataway, NJ, USA, 2019; pp. 1–106. [CrossRef]
- IEEE Std 802.11-2020 (Revision of IEEE Std 802.11-2016); IEEE Standard for Information Technology–Telecommunications and Information Exchange between Systems—Local and Metropolitan Area Networks–Specific Requirements—Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications. IEEE: Piscataway, NJ, USA, 2021; pp. 1–4379. [CrossRef]
- Fouda, A.; Berry, R.; Vukovic, I. Interleaved One-shot Semi-Persistent Scheduling for BSM Transmissions in C-V2X Networks. In Proceedings of the 2021 IEEE Vehicular Networking Conference (VNC), Virtual Event, 10–12 November 2021; pp. 143–150. [Google Scholar] [CrossRef]
- Rolich, A.; Turcanu, I.; Vinel, A.; Baiocchi, A. Understanding the impact of persistence and propagation on the Age of Information of broadcast traffic in 5G NR-V2X sidelink communications. Comput. Netw. 2024, 248, 110503. [Google Scholar] [CrossRef]
- Saifuddin, M.; Zaman, M.; Fallah, Y.P.; Rao, J. Addressing Rare Outages in C-V2X With Time-Controlled One-Shot Resource Scheduling. IEEE Open J. Intell. Transp. Syst. 2024, 5, 208–222. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A. Reinforcement Learning; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
- Keng, W.L.; Graesser, L. Foundations of Deep Reinforcement Learning; Addison-Wesley Professional: Boston, MA, USA, 2019. [Google Scholar]
- Ladosz, P.; Weng, L.; Kim, M.; Oh, H. Exploration in deep reinforcement learning: A survey. Inf. Fusion 2022, 85, 1–22. [Google Scholar] [CrossRef]
- Antonyshyn, L.; Givigi, S. Deep Model-Based Reinforcement Learning for Predictive Control of Robotic Systems with Dense and Sparse Rewards. J. Intell. Robot. Syst. 2024, 110, 100. [Google Scholar] [CrossRef]
- Schettler, M.; Buse, D.S.; Zubow, A.; Dressler, F. How to Train your ITS? Integrating Machine Learning with Vehicular Network Simulation. In Proceedings of the 12th IEEE Vehicular Networking Conference (VNC 2020), Virtual Conference, New York, NY, USA, 16–18 December 2020. [Google Scholar] [CrossRef]
- Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. OpenAI Gym. arXiv 2016, arXiv:1606.01540. [Google Scholar]
- Lopez, P.A.; Behrisch, M.; Bieker-Walz, L.; Erdmann, J.; Flötteröd, Y.P.; Hilbrich, R.; Lücken, L.; Rummel, J.; Wagner, P.; Wiessner, E. Microscopic Traffic Simulation using SUMO. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 2575–2582. [Google Scholar] [CrossRef]
- Varga, A. The OMNeT++ Discrete Event Simulation System. In Proceedings of the European Simulation Multiconference (ESM’2001), Prague, Czech Republic, 7–9 June 2001. [Google Scholar]
- Loon, K.W.; Graesser, L.; Cvitkovic, M. SLM Lab: A Comprehensive Benchmark and Modular Software Framework for Reproducible Deep Reinforcement Learning. arXiv 2019, arXiv:1912.12482. [Google Scholar]
- Cho, S. Signal Outages of CSMA/CA-Based Wireless Networks with Different AP Densities. Sustainability 2018, 10, 1483. [Google Scholar] [CrossRef]
- Federal Highway Administration (FHWA). Traffic Flow Theory; Technical Report; Federal Highway Administration: Washington, DC, USA, 1992. [Google Scholar]
- Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010; Proceedings of Machine Learning Research. Teh, Y.W., Titterington, M., Eds.; Volume 9, pp. 249–256. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the ICLR (Poster), San Diego, CA, USA, 7–9 May 2015; Bengio, Y., LeCun, Y., Eds.; [Google Scholar]
- Polyak, B. Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 1964, 4, 1–17. [Google Scholar] [CrossRef]
General Parameters | Value | Unit |
---|---|---|
Road length | 2 | km |
Vehicle density | 100, 160, 220, 280, 340, 400 | veh/km |
Vehicle speed | 30 | m/s |
Inter-vehicle distance | 40, 60, 80 | m |
Channel frequency | 5.89 | GHz |
Data rate | 6 | Mbps |
Transmit power | 0, 5, 10, 15, 20, 25, 30 | dBm |
Receiver sensitivity | −95 | dBm |
Message size | 400 | Bytes |
Frame duration (T) | 100 | ms |
Reward Function Parameters | Value | Unit |
48 | ||
12 | ||
1 | ||
7 | ||
1.5 | m | |
180 | m | |
90 | m | |
500 | m | |
0.5 | ||
3 | ||
0.7 | ||
5 | ||
0.01 | ||
0 | dBm | |
5 | dBm | |
SAC Parameters | Value | |
Weight initialization | random | |
Optimizer | Adam | |
Learning rate | ||
Batch size | 512 | |
0.99 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guzmán Leguel, A.K.; Nguyen, H.-H.; Gómez Gutiérrez, D.; Yoo, J.; Jeong, H.-Y. Enhancing the Minimum Awareness Failure Distance in V2X Communications: A Deep Reinforcement Learning Approach. Sensors 2024, 24, 6086. https://doi.org/10.3390/s24186086
Guzmán Leguel AK, Nguyen H-H, Gómez Gutiérrez D, Yoo J, Jeong H-Y. Enhancing the Minimum Awareness Failure Distance in V2X Communications: A Deep Reinforcement Learning Approach. Sensors. 2024; 24(18):6086. https://doi.org/10.3390/s24186086
Chicago/Turabian StyleGuzmán Leguel, Anthony Kyung, Hoa-Hung Nguyen, David Gómez Gutiérrez, Jinwoo Yoo, and Han-You Jeong. 2024. "Enhancing the Minimum Awareness Failure Distance in V2X Communications: A Deep Reinforcement Learning Approach" Sensors 24, no. 18: 6086. https://doi.org/10.3390/s24186086
APA StyleGuzmán Leguel, A. K., Nguyen, H. -H., Gómez Gutiérrez, D., Yoo, J., & Jeong, H. -Y. (2024). Enhancing the Minimum Awareness Failure Distance in V2X Communications: A Deep Reinforcement Learning Approach. Sensors, 24(18), 6086. https://doi.org/10.3390/s24186086