Learn to Bet: Using Reinforcement Learning to Improve Vehicle Bids in Auction-Based Smart Intersections
Abstract
:1. Introduction
1.1. Related Work
1.1.1. General Coordination of Autonomous Vehicles
1.1.2. Intersection Management by Auctions
1.1.3. Learning
2. Materials and Methods
2.1. Scenario
2.2. The Proposal
2.3. Reinforcement Model
- The Position Reward depends on the result of the vehicle’s bid or sponsorship. If the vehicle participates in an auction and wins, it gets a positive reward, if it loses it receives a negative one. Otherwise, if the vehicle sponsors another one it receives a reward proportional to its advancement in the lane;
- The Discount Reward is the fraction of the bid saved with the discount, with respect to the standard bid. High values are obtained if the bidder applies convenient discounts.
- -
- is the initial position of the agent, i.e., the position at the end of the previous iteration;
- -
- is the final position of the agent, i.e., the position at the current iteration;
- -
- is a positive parameter representing the reinforcement signal when the vehicle wins an auction.
- -
- is a negative parameter representing the reinforcement signal when the vehicle does not win an auction.
- -
- d is the multiplier returned by the bidder, related to a specific action action . For example, a discount of on a bid corresponds to a multiplier of 0.1 returned by the bidder. This translates to a discount reward () of 0.9, which is high;
- -
- is a tuning parameter used to balance the position reward and the discount reward.
2.4. Neural Network Architecture
2.5. Training
Algorithm 1 Q-learning algorithm | |
Input Untrained Q (Q-network) and T (Target network) Variables : action executed at time t; : state of the environment at time t; R: reward obtained by executing in state ; M: experience replay memory; F: training frequency; K: target-network update frequency; Output : trained Q-network and Target network; | |
1: | ▹ sample counter |
2: | ▹ training counter |
3: for each bid or sponsorship made by test vehicle do | |
4: Observe ; | |
5: Pick a random with probability , else choose ; | |
6: Perform action (apply multiplier on either bid or sponsorship); | |
7: Observe and reward R; | |
8: M.append; | |
9: ; | |
10: if then | |
11: ; | |
12: ; | |
13: Experience replay (Q, M); | ▹ train Q using Algorithm 2 |
14: if then | |
15: ; | |
16: Update target network; | ▹ copy weights of Q in T |
17: end if | |
18: end if | |
19: ; | |
20: end for |
Algorithm 2 Experience replay |
Input Q: Q-network; M: experience replay memory, containing tuples ; Variables b: batch size; : discount factor; : learning rate; T: Target network; |
|
2.6. Experiments
2.6.1. Set Up
2.6.2. Preliminary Experiments
- Routine Vehicles perform the same routine every day (same trip at the same time), e.g., they simulate drivers going to work in the morning and back home in the afternoon. The parameters associated with these vehicles do not change among simulations.
- Random Vehicles do not have a repetitive or predictable behavior, e.g., they simulate vehicles that are passing through the block or city. Parameters and v associated with these vehicles are initialized randomly at each simulation.
- Saved Budget percentage (S), i.e., how much budget is left at the end of a trip, given as the percentage of the initial budget.
- Waiting times. In particular:
- –
- Traffic Waiting Time (TWT): the time spent waiting in lanes, i.e., from when a vehicle enters a lane until it exits the lane. The counter related to each vehicle measuring its TWT is updated only when its speed is approaching zero (meaning it is waiting in traffic).
- –
- Crossing Waiting Time (CWT): the time spent waiting at the intersection site as the first vehicle of the lane, i.e., from when the vehicle becomes the first vehicle to when it crosses the intersection.
2.6.3. Experiments
3. Results
3.1. Preliminary Experiments
- Scenario 1: percentage of random vehicles is 0%.
- Scenario 2: percentage of random vehicles is 10%.
- Scenario 3: percentage of random vehicles is 20%.
- Scenario 4: percentage of random vehicles is 30%.
3.2. Experiments
- Experiment 1: varying the number of vehicles on the map.
- Experiment 2: varying the number of random vehicles on the map.
- Experiment 3: varying the initial budget of the test vehicle.
3.3. Discussion and Future Work
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Syed, A.S.; Sierra-Sosa, D.; Kumar, A.; Elmaghraby, A. IoT in smart cities: A survey of technologies, practices and challenges. Smart Cities 2021, 4, 429–475. [Google Scholar] [CrossRef]
- Bertogna, M.; Burgio, P.; Cabri, G.; Capodieci, N. Adaptive coordination in autonomous driving: Motivations and perspectives. In Proceedings of the 2017 IEEE 26th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), Poznan, Poland, 21–23 June 2017; pp. 15–17. [Google Scholar]
- Chen, J.; Wang, Q.; Cheng, H.H.; Peng, W.; Xu, W. A review of vision-based traffic semantic understanding in ITSs. IEEE Trans. Intell. Transp. Syst. 2022, 23, 19954–19979. [Google Scholar] [CrossRef]
- J3016_202104; Taxonomy and Definitions for Terms Related to Driving Automation Systems for On-Road Motor Vehicles. SAE International: Warrendale, PA, USA, 2018; Volume 4970, pp. 1–5.
- Carlino, D.; Boyles, S.D.; Stone, P. Auction-based autonomous intersection management. In Proceedings of the 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013), The Hague, The Netherlands, 6–9 October 2013; pp. 529–534. [Google Scholar]
- Mariani, S.; Cabri, G.; Zambonelli, F. Coordination of autonomous vehicles: Taxonomy and survey. ACM Comput. Surv. CSUR 2021, 54, 1–33. [Google Scholar] [CrossRef]
- HajiRassouliha, A.; Taberner, A.J.; Nash, M.P.; Nielsen, P.M. Suitability of recent hardware accelerators (DSPs, FPGAs, and GPUs) for computer vision and image processing algorithms. Signal Process. Image Commun. 2018, 68, 101–119. [Google Scholar] [CrossRef]
- Deng, Y.; Chen, Z.; Yao, X.; Hassan, S.; Wu, J. Task Scheduling for Smart City Applications Based on Multi-Server Mobile Edge Computing. IEEE Access 2019, 7, 14410–14421. [Google Scholar] [CrossRef]
- Pinciroli, C.; Beltrame, G. Swarm-Oriented Programming of Distributed Robot Networks. Computer 2016, 49, 32–41. [Google Scholar] [CrossRef]
- Murthy, D.K.; Masrur, A. Braking in Close Following Platoons: The Law of the Weakest. In Proceedings of the 2016 Euromicro Conference on Digital System Design (DSD), Limassol, Cyprus, 31 August–2 September 2016; pp. 613–620. [Google Scholar]
- Diaz Ogás, M.G.; Fabregat, R.; Aciar, S. Survey of smart parking systems. Appl. Sci. 2020, 10, 3872. [Google Scholar] [CrossRef]
- Kotb, A.O.; Shen, Y.C.; Huang, Y. Smart parking guidance, monitoring and reservations: A review. IEEE Intell. Transp. Syst. Mag. 2017, 9, 6–16. [Google Scholar] [CrossRef]
- Tandon, R.; Gupta, P. Optimizing smart parking system by using fog computing. In Proceedings of the Advances in Computing and Data Sciences: Third International Conference, ICACDS 2019, Ghaziabad, India, 12–13 April 2019; Revised Selected Papers, Part II 3. Springer: Berlin/Heidelberg, Germany, 2019; pp. 724–737. [Google Scholar]
- Khanna, A.; Anand, R. IoT based smart parking system. In Proceedings of the 2016 International Conference on Internet of Things and Applications (IOTA), Pune, India, 22–24 January 2016; pp. 266–270. [Google Scholar]
- Kotb, A.O.; Shen, Y.C.; Zhu, X.; Huang, Y. iParker—A new smart car-parking system based on dynamic resource allocation and pricing. IEEE Trans. Intell. Transp. Syst. 2016, 17, 2637–2647. [Google Scholar] [CrossRef]
- Sadhukhan, P. An IoT-based E-parking system for smart cities. In Proceedings of the 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Udupi, India, 13–16 September 2017; pp. 1062–1066. [Google Scholar]
- Pham, T.N.; Tsai, M.F.; Nguyen, D.B.; Dow, C.R.; Deng, D.J. A Cloud-Based Smart-Parking System Based on Internet-of-Things Technologies. IEEE Access 2015, 3, 1581–1591. [Google Scholar] [CrossRef]
- Muzzini, F.; Capodieci, N.; Montangero, M. Improving urban viability through smart parking. Int. J. Parallel Emergent Distrib. Syst. 2023, 38, 522–540. [Google Scholar] [CrossRef]
- Zou, W.; Sun, Y.; Zhou, Y.; Lu, Q.; Nie, Y.; Sun, T.; Peng, L. Limited sensing and deep data mining: A new exploration of developing city-wide parking guidance systems. IEEE Intell. Transp. Syst. Mag. 2020, 14, 198–215. [Google Scholar] [CrossRef]
- Cox, T.; Thulasiraman, P. A zone-based traffic assignment algorithm for scalable congestion reduction. ICT Express 2017, 3, 204–208. [Google Scholar] [CrossRef]
- Capodieci, N.; Cavicchioli, R.; Muzzini, F.; Montagna, L. Improving emergency response in the era of ADAS vehicles in the Smart City. ICT Express 2021, 7, 481–486. [Google Scholar] [CrossRef]
- Li, J.; Ferguson, N. A multi-dimensional rescheduling model in disrupted transport network using rule-based decision making. Procedia Comput. Sci. 2020, 170, 90–97. [Google Scholar] [CrossRef]
- Schepperle, H.; Böhm, K. Agent-based traffic control using auctions. In Proceedings of the International Workshop on Cooperative Information Agents, Delft, The Netherlands, 19–21 September 2007; pp. 119–133. [Google Scholar]
- Vickrey, W. Counterspeculation, auctions, and competitive sealed tenders. J. Financ. 1961, 16, 8–37. [Google Scholar] [CrossRef]
- Vasirani, M.; Ossowski, S. A market-inspired approach for intersection management in urban road traffic networks. J. Artif. Intell. Res. 2012, 43, 621–659. [Google Scholar] [CrossRef]
- Kiran, B.R.; Sobh, I.; Talpaert, V.; Mannion, P.; Al Sallab, A.A.; Yogamani, S.; Pérez, P. Deep reinforcement learning for autonomous driving: A survey. IEEE Trans. Intell. Transp. Syst. 2021, 23, 4909–4926. [Google Scholar] [CrossRef]
- Xie, J.; Xu, X.; Wang, F.; Liu, Z.; Chen, L. Coordination control strategy for human-machine cooperative steering of intelligent vehicles: A reinforcement learning approach. IEEE Trans. Intell. Transp. Syst. 2022, 23, 21163–21177. [Google Scholar] [CrossRef]
- Wei, H.; Zheng, G.; Gayah, V.; Li, Z. Recent advances in reinforcement learning for traffic signal control: A survey of models and evaluation. ACM SIGKDD Explor. Newsl. 2021, 22, 12–18. [Google Scholar] [CrossRef]
- Glorio, N.; Mariani, S.; Cabri, G.; Zambonelli, F. An Adaptive Approach for the Coordination of Autonomous Vehicles at Intersections. In Proceedings of the 2021 IEEE 30th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), Bayonne, France, 27–29 October 2021; pp. 1–6. [Google Scholar]
- Kaelbling, L.P.; Littman, M.L.; Moore, A.W. Reinforcement learning: A survey. J. Artif. Intell. Res. 1996, 4, 237–285. [Google Scholar] [CrossRef]
- Joo, H.; Ahmed, S.H.; Lim, Y. Traffic signal control for smart cities using reinforcement learning. Comput. Commun. 2020, 154, 324–330. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
- Karthikeyan, P.; Chen, W.L.; Hsiung, P.A. Autonomous Intersection Management by Using Reinforcement Learning. Algorithms 2022, 15, 326. [Google Scholar] [CrossRef]
- Antonio, G.P.; Maria-Dolores, C. AIM5la: A latency-aware deep reinforcement learning-based autonomous intersection management system for 5G communication networks. Sensors 2022, 22, 2217. [Google Scholar] [CrossRef]
- Mushtaq, A.; Haq, I.U.; Sarwar, M.A.; Khan, A.; Khalil, W.; Mughal, M.A. Multi-Agent Reinforcement Learning for Traffic Flow Management of Autonomous Vehicles. Sensors 2023, 23, 2373. [Google Scholar] [CrossRef] [PubMed]
- Shi, Y.; Liu, Y.; Qi, Y.; Han, Q. A control method with reinforcement learning for urban un-signalized intersection in hybrid traffic environment. Sensors 2022, 22, 779. [Google Scholar] [CrossRef]
- Gutiérrez-Moreno, R.; Barea, R.; López-Guillén, E.; Araluce, J.; Bergasa, L.M. Reinforcement learning-based autonomous driving at intersections in CARLA simulator. Sensors 2022, 22, 8373. [Google Scholar] [CrossRef]
- Cabri, G.; Gherardini, L.; Montangero, M.; Muzzini, F. About auction strategies for intersection management when human-driven and autonomous vehicles coexist. Multimed. Tools Appl. 2021, 80, 15921–15936. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Lopez, P.A.; Behrisch, M.; Bieker-Walz, L.; Erdmann, J.; Flötteröd, Y.P.; Hilbrich, R.; Lücken, L.; Rummel, J.; Wagner, P.; Wießner, E. Microscopic Traffic Simulation using SUMO. In Proceedings of the 21st IEEE International Conference on Intelligent Transportation Systems, Maui, HI, USA, 4–7 November 2018. [Google Scholar]
- Ejercito, P.M.; Nebrija, K.G.E.; Feria, R.P.; Lara-Figueroa, L.L. Traffic simulation software review. In Proceedings of the 2017 8th International Conference on Information, Intelligence, Systems & Applications (IISA), Larnaca, Cyprus, 27–30 August 2017; pp. 1–4. [Google Scholar]
- Axhausen, K.W.; Horni, A.; Nagel, K. The Multi-Agent Transport Simulation MATSim; Ubiquity Press: London, UK, 2016. [Google Scholar]
- Saidallah, M.; El Fergougui, A.; Elalaoui, A.E. A comparative study of urban road traffic simulators. MATEC Web Conf. 2016, 81, 05002. [Google Scholar] [CrossRef]
- Diallo, A.O.; Lozenguez, G.; Doniec, A.; Mandiau, R. Comparative evaluation of road traffic simulators based on modeler’s specifications: An application to intermodal mobility behaviors. In Proceedings of the 13th International Conference on Agents and Artificial Intelligence, Virtual, 4–6 February 2021; SciTePress—Science and Technology Publications: Setúbal, Portugal, 2021; pp. 265–272. [Google Scholar]
- Gherardini, L.; Cabri, G.; Montangero, M. Decentralized approaches for autonomous vehicles coordination. Internet Technol. Lett. 2022, e398. [Google Scholar] [CrossRef]
Focus On | Intersection Type | Method | |
---|---|---|---|
[5] | intersection manager | unsignalized | auctions (driver-specified budget) |
[23] | intersection manager | unsignalized | auctions (two-step second-price sealed-bid) |
[25] | intersection manager | unsignalized | reservation-based |
[29] | intersection manager | unsignalized | reinforcement learning (to select predefined policy) |
[31] | intersection manager | signalized | reinforcement learning |
[33] | intersection manager | unsignalized | reinforcement learning |
[34] | latencies | unsignalized | reinforcement learning |
[35] | intersection manager | signalized | multi-agent reinforcement learning |
[36] | vehicle manoeuvers | unsignalized | reinforcement learning |
[37] | vehicle manoeuvers | unsignalized and signalized | reinforcement learning |
Explanation | Symbol | Value |
---|---|---|
Sponsorship | ||
Initial Budget | … | |
Number of vehicles | VS | … |
Maximum speed | v |
Explanation | Symbol | Value |
---|---|---|
Reward signal | w | 2 |
Penalty signal | ℓ | −0.3 |
Balancing parameter | 0.3 | |
Training frequency | F | 10 |
Update frequency | K | 10 |
Threshold | E | 400 |
Batch size | b | 32 |
Discount factor | 0.3 | |
Learning rate |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cabri, G.; Lugli, M.; Montangero, M.; Muzzini, F. Learn to Bet: Using Reinforcement Learning to Improve Vehicle Bids in Auction-Based Smart Intersections. Sensors 2024, 24, 1288. https://doi.org/10.3390/s24041288
Cabri G, Lugli M, Montangero M, Muzzini F. Learn to Bet: Using Reinforcement Learning to Improve Vehicle Bids in Auction-Based Smart Intersections. Sensors. 2024; 24(4):1288. https://doi.org/10.3390/s24041288
Chicago/Turabian StyleCabri, Giacomo, Matteo Lugli, Manuela Montangero, and Filippo Muzzini. 2024. "Learn to Bet: Using Reinforcement Learning to Improve Vehicle Bids in Auction-Based Smart Intersections" Sensors 24, no. 4: 1288. https://doi.org/10.3390/s24041288
APA StyleCabri, G., Lugli, M., Montangero, M., & Muzzini, F. (2024). Learn to Bet: Using Reinforcement Learning to Improve Vehicle Bids in Auction-Based Smart Intersections. Sensors, 24(4), 1288. https://doi.org/10.3390/s24041288