Multi-Agent Credit Assignment and Bankruptcy Game for Improving Resource Allocation in Smart Cities
Abstract
:1. Introduction
2. Related Work
- Temporal CA
- Structural CA
- Social CA
- Multi-agent CA
- Actor–critic
- Q-learning
- SARSA
3. Preliminaries
3.1. Definitions
3.2. Bankruptcy Concepts
3.2.1. Definition 1: Bankruptcy Problem
3.2.2. Definition 2: Bankruptcy Game
3.2.3. Adjusted Proportional Bankruptcy Rule (AP Rule)
3.3. Problem Definition
- Fair–inefficient;
- Fair–efficient;
- Unfair–inefficient;
- Unfair–efficient.
- Unfair and efficient;
- Fair and inefficient.
3.4. Multi-Score Puzzle
4. Methodology
4.1. Reverse Adjusted Proportional Algorithm (RevAP)
Algorithm 1: Algorithm RevAP |
Inputs: Vector of Agents’ Reward r(rT.1, rT.2, …, rT.n) where for any rT.i = 0 R: Global Reward Outputs: Valued Vector r(rT.1, rT.2, …, rT.n)
|
- TS-only method.
- TS + MAS priority method (TS + MAS method).
- TS + expert agent priority method (TS + ExAg method).
4.2. MARL as Execution Platform
4.2.1. Training Phase
Algorithm 2: Algorithm Training. |
Inputs: MAS Outputs: A ranked list of agents based on their knowledge
|
4.2.2. Test Phase
- The reward that the critic receives from the environment and distributes among the agents is high enough so that all the agents start to work.
- The reward that the critic receives from the environment and distributes among the agents is so low that no agent starts to work.
- The reward that the critic receives from the environment is less than the total needs of the agents to reach their TST, but some of them can start to work anyway.
Algorithm 3: Algorithm Test. |
Inputs: MAS: a set of agents Multi-score Puzzle: as the environment Outputs: Solved Puzzle Score of game
|
5. Evaluation and Results
5.1. Group Learning Rate
5.2. Confidence
5.3. Expertness
- : Number of times that the agent receives a reward.
- : Number of times that the agent receives a punishment.
5.4. Certainty
5.5. Efficiency
5.6. Correctness
6. Discussion
7. Conclusions and Technical Outlook
7.1. Conclusions
7.2. Technical Outlook
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Javed, A.R.; Shahzad, F.; ur Rehman, S.; Zikria, Y.B.; Razzak, I.; Jalil, Z.; Xu, G. Future smart cities requirements, emerging technologies, applications, challenges, and future aspects. Cities 2022, 129, 103794. [Google Scholar] [CrossRef]
- Mahmood, O.A.; Abdellah, A.R.; Muthanna, A.; Koucheryavy, A. Distributed Edge Computing for Resource Allocation in Smart Cities Based on the IoT. Information 2022, 13, 328. [Google Scholar] [CrossRef]
- Jan, B.; Farman, H.; Khan, M.; Talha, M.; Din, I.U. Designing a smart transportation system: An internet of things and big data approach. IEEE Wirel. Commun. 2019, 26, 73–79. [Google Scholar] [CrossRef]
- Vergütz, A.; G. Prates, N., Jr.; Henrique Schwengber, B.; Santos, A.; Nogueira, M. An Architecture for the Performance Management of Smart Healthcare Applications. Sensors 2020, 20, 5566. [Google Scholar] [CrossRef]
- Liu, H.; Li, S.; Sun, W. Resource allocation for edge computing without using cloud center in smart home environment: A pricing approach. Sensors 2020, 20, 6545. [Google Scholar] [CrossRef]
- Thornbush, M.; Golubchikov, O. Smart energy cities: The evolution of the city-energy-sustainability nexus. Environ. Dev. 2021, 39, 100626. [Google Scholar] [CrossRef]
- Pech, M.; Vrchota, J.; Bednář, J. Predictive maintenance and intelligent sensors in smart factory. Sensors 2021, 21, 1470. [Google Scholar] [CrossRef] [PubMed]
- Wooldridge, M. An Introduction to Multiagent Systems; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
- Liu, B.; Xie, Y.; Feng, L.; Fu, P. Correcting biased value estimation in mixing value-based multi-agent reinforcement learning by multiple choice learning. Eng. Appl. Artif. Intell. 2022, 116, 105329. [Google Scholar] [CrossRef]
- Harati, A.; Ahmadabadi, M.N.; Araabi, B.N. Knowledge-based multiagent credit assignment: A study on task type and critic information. IEEE Syst. J. 2007, 1, 55–67. [Google Scholar] [CrossRef]
- Nguyen, D.T.; Kumar, A.; Lau, H.C. Credit Assignment for Collective Multiagent RL with Global Rewards. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, Montreal, QC, Canada, 3–8 December 2018; Curran Associates Inc.: Red Hook, NY, USA, 2018; pp. 8113–8124. [Google Scholar]
- Wang, Y.; Han, B.; Wang, T.; Dong, H.; Zhang, C. Off-policy multi-agent decomposed policy gradients. arXiv 2020, arXiv:2007.12322. [Google Scholar]
- Rahaie, Z.; Beigy, H. Critic learning in multi agent credit assignment problem. J. Intell. Fuzzy Syst. 2016, 30, 3465–3480. [Google Scholar] [CrossRef]
- Xiang, L.; Tan, Y.; Shen, G.; Jin, X. Applications of multi-agent systems from the perspective of construction management: A literature review. Eng. Constr. Archit. Manag. 2021, 29, 3288–3310. [Google Scholar] [CrossRef]
- Oderanti, F.O.; Li, F.; De Wilde, P. Application of strategic fuzzy games to wage increase negotiation and decision problems. Expert Syst. Appl. 2012, 39, 11103–11114. [Google Scholar] [CrossRef]
- O’Neill, B. A problem of rights arbitration from the Talmud. Math. Soc. Sci. 1982, 2, 345–371. [Google Scholar] [CrossRef]
- Streitz, N.A. Citizen Centered Design for Humane and Sociable Hybrid Cities. In International Biennial Conference Hybrid City; Academia: Athens, Greece, 2015; pp. 17–20. [Google Scholar]
- Ramírez-Moreno, M.A.; Keshtkar, S.; Padilla-Reyes, D.A.; Ramos-López, E.; García-Martínez, M.; Hernández-Luna, M.C.; Mogro, A.E.; Mahlknecht, J.; Huertas, J.I.; Peimbert-García, R.E.; et al. Sensors for sustainable smart cities: A review. Appl. Sci. 2021, 11, 8198. [Google Scholar] [CrossRef]
- Zhao, L.; Wang, J.; Liu, J.; Kato, N. Optimal edge resource allocation in IoT-based smart cities. IEEE Netw. 2019, 33, 30–35. [Google Scholar] [CrossRef]
- Yigitcanlar, T. Smart cities: An effective urban development and management model? Aust. Plan. 2015, 52, 27–34. [Google Scholar] [CrossRef]
- Clemen, T.; Ahmady-Moghaddam, N.; Lenfers, U.A.; Ocker, F.; Osterholz, D.; Ströbele, J.; Glake, D. Multi-agent systems and digital twins for smarter cities. In Proceedings of the 2021 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, Virtual Event, 31 May–2 June 2021; pp. 45–55. [Google Scholar]
- De Haan, J.B. Animal Psychology: Its Nature and Its Problems; Routledge: London, UK, 2018. [Google Scholar]
- Chen, X.; Liu, G. Federated Deep Reinforcement Learning-Based Task Offloading and Resource Allocation for Smart Cities in a Mobile Edge Network. Sensors 2022, 22, 4738. [Google Scholar] [CrossRef]
- Zhang, K.; Yang, Z.; Başar, T. Multi-agent reinforcement learning: A selective overview of theories and algorithms. In Handbook of Reinforcement Learning and Control; Springer: Berlin/Heidelberg, Germany, 2021; pp. 321–384. [Google Scholar]
- Zhou, M.; Liu, Z.; Sui, P.; Li, Y.; Chung, Y.Y. Learning implicit credit assignment for cooperative multi-agent reinforcement learning. Adv. Neural Inf. Process. Syst. 2020, 33, 11853–11864. [Google Scholar]
- Skinner, B.F. The Behavior of Organisms: An Experimental Analysis; BF Skinner Foundation: Cambridge, MA, USA, 2019. [Google Scholar]
- Guisi, D.M.; Ribeiro, R.; Teixeira, M.; Borges, A.P.; Enembreck, F. Reinforcement learning with multiple shared rewards. Procedia Comput. Sci. 2016, 80, 855–864. [Google Scholar] [CrossRef]
- Bagnell, D.; Ng, A. On local rewards and scaling distributed reinforcement learning. Adv. Neural Inf. Process. Syst. 2005, 18, 91–98. [Google Scholar]
- Omidshafiei, S.; Kim, D.K.; Liu, M.; Tesauro, G.; Riemer, M.; Amato, C.; Campbell, M.; How, J.P. Learning to teach in cooperative multiagent reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 6128–6136. [Google Scholar]
- Kim, W.; Park, J.; Sung, Y. Communication in multi-agent reinforcement learning: Intention sharing. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020; pp. 1–15. [Google Scholar]
- Salimibeni, M.; Mohammadi, A.; Malekzadeh, P.; Plataniotis, K.N. Multi-Agent Reinforcement Learning via Adaptive Kalman Temporal Difference and Successor Representation. Sensors 2022, 22, 1393. [Google Scholar] [CrossRef]
- Li, J.; Kuang, K.; Wang, B.; Liu, F.; Chen, L.; Wu, F.; Xiao, J. Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning. arXiv 2021, arXiv:2106.00285. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Rahmattalabi, A.; Chung, J.J.; Colby, M.; Tumer, K. D++: Structural credit assignment in tightly coupled multiagent domains. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea, 9–14 October 2016; pp. 4424–4429. [Google Scholar]
- Mao, W.; Gratch, J. The social credit assignment problem. In Proceedings of the International Workshop on Intelligent Virtual Agents; Springer: Berlin/Heidelberg, Germany, 2003; pp. 39–47. [Google Scholar]
- Mannion, P.; Devlin, S.; Duggan, J.; Howley, E. Multi-agent credit assignment in stochastic resource management games. Knowl. Eng. Rev. 2017, 32, e16. [Google Scholar] [CrossRef] [Green Version]
- Rahaie, Z.; Beigy, H. Expertness framework in multi-agent systems and its application in credit assignment problem. Intell. Data Anal. 2014, 18, 511–528. [Google Scholar] [CrossRef]
- Airiau, S. Cooperative games and multiagent systems. Knowl. Eng. Rev. 2013, 28, 381–424. [Google Scholar] [CrossRef]
- Wang, J.; Hong, Y.; Wang, J.; Xu, J.; Tang, Y.; Han, Q.L.; Kurths, J. Cooperative and Competitive Multi-Agent Systems: From Optimization to Games. IEEE/CAA J. Autom. Sin. 2022, 9, 763–783. [Google Scholar] [CrossRef]
- Yea, M.; Kim, D.; Cheong, T.; Moon, J.; Kang, S. Baking and slicing the pie: An application to the airline alliance’s profit-sharing based on cooperative game theory. J. Air Transp. Manag. 2022, 102, 102219. [Google Scholar] [CrossRef]
- Xue, Y.; Deng, Y. A real Shapley value for evidential games with fuzzy characteristic function. Eng. Appl. Artif. Intell. 2021, 104, 104350. [Google Scholar] [CrossRef]
- Shapley, L.S. Cores of convex games. Int. J. Game Theory 1971, 1, 11–26. [Google Scholar] [CrossRef]
- Luo, C.; Zhou, X.; Lev, B. Core, shapley value, nucleolus and nash bargaining solution: A Survey of recent developments and applications in operations management. Omega 2022, 110, 102638. [Google Scholar] [CrossRef]
- Curiel, I.J.; Maschler, M.; Tijs, S.H. Bankruptcy games. Z. Für Oper. Res. 1987, 31, A143–A159. [Google Scholar] [CrossRef]
- Mahini, H.; Navidi, H.; Babaei, F.; Mousavirad, S.M. EvoBank: An evolutionary game solution for Bankruptcy problem. Swarm Evol. Comput. 2021, 67, 100959. [Google Scholar] [CrossRef]
- Araújo, J.P.; Figueiredo, M.A.; Ayala Botto, M. Control with adaptive Q-learning: A comparison for two classical control problems. Eng. Appl. Artif. Intell. 2022, 112, 104797. [Google Scholar] [CrossRef]
- Ramík, J.; Vlach, M. Bankruptcy problem under uncertainty of claims and estate. Fuzzy Sets Syst. 2022. [Google Scholar] [CrossRef]
- Arunraja, A.; Jayanthy, S. Tuning methods of various controllers. Mater. Today Proc. 2021. [Google Scholar] [CrossRef]
Method | Equal Approach | Fair Approach | Knowledge-Based | Efficiency Performance | Contribution | Advantage | Disadvantage | Best Performance |
---|---|---|---|---|---|---|---|---|
Ranking | × | × | ✓ | Partly | Introducing the parameters to improve the knowledge-based method to solve the MCA | The agents’ knowledge is used | The possibility of assigning a reward to the less knowledgeable agents | Performance matters somewhat |
History base | × | Partly | ✓ | × | Introducing a model-based method to solve the MCA | Modeling the MCA by a graph | Weakness in scalability | Model-based |
Dynamic | × | × | ✓ | Partly | Introducing a new parameter to solve the MCA | The agents’ knowledge is used | Low accuracy | The numbers of features are low |
TS-only | × | × | ✓ | ✓ | 1. Introducing the TST constraint 2. Introducing the MsP problem 3. Introducing the bankruptcy method to solve the MCA problem | Improves the system’s performance | Ignores the remaining reward | The performance is important |
TS + MAS | × | × | ✓ | ✓ | 1. Introducing the TST constraint 2. Introducing the MsP problem 3. Introducing the bankruptcy method to solve the MCA problem | Improves the system’s performance | Training phase is needed | The performance is important |
TS + ExAg | × | × | ✓ | ✓ | 1. Introducing the TST constraint 2. Introducing the MsP problem 3. Introducing the bankruptcy method to solve the MCA problem | Improves the system’s performance | Training phase is needed | The performance is important |
LeCTR | ✓ | × | × | × | Peer-to-peer teaching in cooperative MARL | Simple | Unfair and inefficient | Simplicity is important |
IS | ✓ | × | × | × | Enhance the coordination among the agents | Simple | Unfair and inefficient | Simplicity is important |
MAK-TD | × | ✓ | × | × | A coupled gradient descent is adopted for developing a method to approximate the reward | The possibility of using heterogeneous agents | Potential information loss in high dimensions | Fairness is important |
SQDDPG | × | ✓ | × | × | Applies the “Shapley” method to solve the MCA problem | Global reward distribution is guaranteed | High complexity | Fairness is important |
Proposed Method | Remaining Reward | Advantage | Disadvantage | Learning Rate | Confidence | Expertness | Certainty | Efficiency | Correctness |
---|---|---|---|---|---|---|---|---|---|
TS-only | Ignored | Simpler than TS + MAS and TS + ExAg | Ignores the remaining reward | Approximately similar to the other proposed methods | Third best | Third best | Third best | Approximately similar to the other proposed methods | Third best |
TS + MAS | Assigns to active agents based on their knowledge | Fairer than TS-MAS and TS + ExAg | More complicated than the other proposed methods | Similar to the other proposed methods | Second best | Second best | Second best | Similar to the other proposed methods | Second best |
TS + ExAg | Assigns to the knowledgeable agent | The best performance between the proposed methods | High dependence on the knowledgeable agents | Similar to the other proposed methods | The best | The best | The best | Similar to the other proposed methods | The best |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yarahmadi, H.; Shiri, M.E.; Challenger, M.; Navidi, H.; Sharifi, A. Multi-Agent Credit Assignment and Bankruptcy Game for Improving Resource Allocation in Smart Cities. Sensors 2023, 23, 1804. https://doi.org/10.3390/s23041804
Yarahmadi H, Shiri ME, Challenger M, Navidi H, Sharifi A. Multi-Agent Credit Assignment and Bankruptcy Game for Improving Resource Allocation in Smart Cities. Sensors. 2023; 23(4):1804. https://doi.org/10.3390/s23041804
Chicago/Turabian StyleYarahmadi, Hossein, Mohammad Ebrahim Shiri, Moharram Challenger, Hamidreza Navidi, and Arash Sharifi. 2023. "Multi-Agent Credit Assignment and Bankruptcy Game for Improving Resource Allocation in Smart Cities" Sensors 23, no. 4: 1804. https://doi.org/10.3390/s23041804
APA StyleYarahmadi, H., Shiri, M. E., Challenger, M., Navidi, H., & Sharifi, A. (2023). Multi-Agent Credit Assignment and Bankruptcy Game for Improving Resource Allocation in Smart Cities. Sensors, 23(4), 1804. https://doi.org/10.3390/s23041804