CAL: Core-Aware Lock for the big.LITTLE Multicore Architecture
Abstract
:1. Introduction
- We conduct a preliminary experiment to observe the fairness issue of traditional locks in big.LITTLE multicore architecture, and carry out an in-depth analysis of their performance problems. We point out the core-unaware lock on big.LITTLE multicore architecture induces performance degradation.
- Motivated by our observation, we designed a core-aware lock for the big.LITTLE multicore architecture to trade-off lock assignment between the big core and little core. We build this scheme based on ticket lock and MCS (John Mellor-Crummey and Michael Scott) lock, and then reorder the lock request according to our lock fairness model. Furthermore, we implemented CAL on top of the open-source lock framework named Litl library, which is widely used for studying lock optimization.
- We conduct a series of experiments on the benchmarks and key-value Store engine named LevelDB. The experimental result validates the effectiveness of our scheme. Compared to SOTA (state-of-the-art), the CAL’s fairness has increased by up to 67%; and its throughput is 26% higher than FIFO-based (First In First Out) lock and 53% higher than competition-based lock, respectively. In addition, the tail latency of CAL is always kept at a low level.
2. Background and Research Motivation
2.1. The big.LITTLE Multicore Architecture
2.2. Lock Classification
- 1.
- 2.
- Competition-based locks, which is a lightweight lock. It contains ttas lock [8], simple spinlock [9], pthread lock and so on. These threads themselves call a single atomic operation (e.g., CAS (compare and swap)) to capture the lock. If the atomic instruction executes successfully, the thread enters the critical section; if the atomic instruction fails, the thread continues to execute this atomic instruction until success.
- 3.
- Cohort lock [10], which is also known as hierarchical locks and designed for NUMA architecture [10,11,12]. This lock aims to improve the data validity lifetime in the CPU cache. This lock generally consists of two layers: the top layer uses a global lock, and the bottom layer uses a local lock for each NUMA node. The hierarchical lock prioritizes threads in the same NUMA node for a fixed period, avoiding cross-node traffic overhead while periodically rotating the global lock between NUMA nodes to achieve long-term fairness.
2.3. Motivation
2.3.1. Problem Statement
2.3.2. Problem Definition
3. Design
3.1. Overview
Algorithm 1 CAL algorithm |
|
3.2. Design of CAL Lock Algorithm
3.3. An Example of CAL Lock
3.4. Overhead Analysis
3.5. Implementation
4. Experiment
4.1. Experimental Environment
4.2. Micro-Benchmarks
4.3. Application Benchmarks
5. Related Work
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Boyd-Wickizer, S.; Kaashoek, M.F.; Morris, R.; Zeldovich, N. Non-scalable locks are dangerous. In Proceedings of the Linux Symposium, San Diego, CA, USA, 27–31 August 2012; pp. 119–130. [Google Scholar]
- Dice, D. Malthusian locks. In Proceedings of the Twelfth European Conference on Computer Systems, Belgrade, Serbia, 23–26 April 2017; pp. 314–327. [Google Scholar]
- Salami, B.; Noori, H.; Naghibzadeh, M. Fairness-aware energy efficient scheduling on heterogeneous multi-core processors. IEEE Trans. Comput. 2020, 70, 72–82. [Google Scholar] [CrossRef]
- Mascitti, A.; Cucinotta, T. Dynamic Partitioned Scheduling of Real-Time DAG Tasks on ARM big. LITTLE Architectures. In Proceedings of the 29th International Conference on Real-Time Networks and Systems, Nantes, France, 7–9 April 2021; pp. 1–11. [Google Scholar]
- Tavakkol, A.; Sadrosadati, M.; Ghose, S.; Kim, J.; Luo, Y.; Wang, Y.; Ghiasi, N.M.; Orosa, L.; Gómez-Luna, J.; Mutlu, O. FLIN: Enabling fairness and enhancing performance in modern NVMe solid state drives. In Proceedings of the 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), Los Angeles, CA, USA, 1–6 June 2018; pp. 397–410. [Google Scholar]
- Dice, D. Brief announcement: A partitioned ticket lock. In Proceedings of the Twenty-Third Annual ACM Symposium on Parallelism in Algorithms and Architectures, New York, NY, USA, 4–6 June 2011; pp. 309–310. [Google Scholar]
- Craig, T. Building FIFO and Priorityqueuing Spin Locks from Atomic Swap; Technical Report, Technical Report TR 93-02-02; Department of Computer Science, University of Washington: Seattle, WA, USA, 1993. [Google Scholar]
- Anderson, T.E. The performance of spin lock alternatives for shared-memory multiprocessors. IEEE Trans. Parallel Distrib. Syst. 1990, 1, 6–16. [Google Scholar] [CrossRef]
- Scott, M.L. Shared-Memory Synchronization; Morgan & Claypool Publishers: San Rafael, CA, USA, 2013. [Google Scholar]
- Dice, D.; Marathe, V.J.; Shavit, N. Lock cohorting: A general technique for designing NUMA locks. ACM Sigplan Not. 2012, 47, 247–256. [Google Scholar] [CrossRef]
- Chabbi, M.; Fagan, M.; Mellor-Crummey, J. High performance locks for multi-level NUMA systems. ACM Sigplan Not. 2015, 50, 215–226. [Google Scholar] [CrossRef]
- Kashyap, S.; Min, C.; Kim, T. Scalable {NUMA-aware} Blocking Synchronization Primitives. In Proceedings of the 2017 USENIX Annual Technical Conference (USENIX ATC 17), Santa Clara, CA, USA, 12–14 July 2017; pp. 603–615. [Google Scholar]
- Liu, N.; Gu, J.; Tang, D.; Li, K.; Zang, B.; Chen, H. Asymmetry-aware scalable locking. In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, 2–6 April 2022; pp. 294–308. [Google Scholar]
- Guiroux, H.; Lachaize, R.; Quéma, V. Multicore Locks: The Case Is Not Closed Yet. In Proceedings of the USENIX Annual Technical Conference, Denver, CO, USA, 22–24 June 2016. [Google Scholar]
- Kashyap, S.; Calciu, I.; Cheng, X.; Min, C.; Kim, T. Scalable and practical locking with shuffling. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, Huntsville, ON, Canada, 27–30 October 2019; pp. 586–599. [Google Scholar]
- Dice, D.; Kogan, A. Compact NUMA-aware locks. In Proceedings of the Fourteenth EuroSys Conference 2019, Dresden, Germany, 25–28 March 2019; pp. 1–15. [Google Scholar]
- Guerraoui, R.; Guiroux, H.; Lachaize, R.; Quéma, V.; Trigonakis, V. Lock–unlock: Is that all? a pragmatic analysis of locking in software systems. ACM Trans. Comput. Syst. 2019, 36, 1–149. [Google Scholar] [CrossRef]
- de Lima Chehab, R.L.; Paolillo, A.; Behrens, D.; Fu, M.; Härtig, H.; Chen, H. Clof: A compositional lock framework for multi-level numa systems. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles, Virtual Event, 26–29 October 2021; pp. 851–865. [Google Scholar]
- Guiroux, H. Understanding the Performance of Mutual Exclusion Algorithms on Modern Multicore Machines. Ph.D. Thesis, Université Grenoble Alpes, Grenoble, France, 2018. [Google Scholar]
- Hendler, D.; Incze, I.; Shavit, N.; Tzafrir, M. Flat combining and the synchronization-parallelism tradeoff. In Proceedings of the Twenty-Second Annual ACM Symposium on Parallelism in Algorithms and Architectures, Thira Santorini, Greece, 13–15 June 2010; pp. 355–364. [Google Scholar]
- Reed, D.P.; Kanodia, R.K. Synchronization with eventcounts and sequencers. Commun. ACM 1979, 22, 115–123. [Google Scholar] [CrossRef]
- Kylheku, K. What is PTHREAD_MUTEX_ADAPTIVE_NP. Retrieved Novemb. 2014, 8, 2018. [Google Scholar]
- Mellor-Crummey, J.M.; Scott, M.L. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. (TOCS) 1991, 9, 21–65. [Google Scholar] [CrossRef]
- Sanjay Ghemawat, J.D. LevelDB. 2024. Available online: https://github.com/google/leveldb (accessed on 17 July 2024).
- Radovic, Z.; Hagersten, E. Hierarchical backoff locks for nonuniform communication architectures. In Proceedings of the Ninth International Symposium on High-Performance Computer Architecture, Anaheim, CA, USA, 12 February 2003; HPCA-9. pp. 241–252. [Google Scholar]
- Lozi, J.P.; David, F.; Thomas, G.; Lawall, J.; Muller, G. Fast and portable locking for multicore architectures. ACM Trans. Comput. Syst. (TOCS) 2016, 33, 1–62. [Google Scholar] [CrossRef]
- Roghanchi, S.; Eriksson, J.; Basu, N. Ffwd: Delegation is (much) faster than you think. In Proceedings of the 26th Symposium on Operating Systems Principles, Shanghai, China, 28 October 2017; pp. 342–358. [Google Scholar]
- Karlin, A.R.; Li, K.; Manasse, M.S.; Owicki, S. Empirical studies of competitve spinning for a shared-memory multiprocessor. ACM Sigops Oper. Syst. Rev. 1991, 25, 41–55. [Google Scholar] [CrossRef]
- Al Bahra, S. Nonblocking Algorithms and Scalable Multicore Programming: Exploring some alternatives to lock-based synchronization. Queue 2013, 11, 40–64. [Google Scholar] [CrossRef]
- Harris, T.L. A pragmatic implementation of non-blocking linked-lists. In International Symposium on Distributed Computing; Springer: Berlin/Heidelberg, Germany, 2001; pp. 300–314. [Google Scholar]
- Hart, T.E.; McKenney, P.E.; Brown, A.D.; Walpole, J. Performance of memory reclamation for lockless synchronization. J. Parallel Distrib. Comput. 2007, 67, 1270–1285. [Google Scholar] [CrossRef]
- Michael, M.M. High performance dynamic lock-free hash tables and list-based sets. In Proceedings of the Fourteenth Annual ACM Symposium on Parallel Algorithms and Architectures, Winnipeg, MB, Canada, 10–13 August 2002; pp. 73–82. [Google Scholar]
Property | Parameter |
---|---|
Processor | 13th Gen Intel(R) Core(TM) i5-13400 |
OS | Ubuntu: 20.04 |
Linux Kernel | Linux 5.15.0-89-generic |
Glibc | 2.31 |
Gcc | 9.4.0 |
Property | Value |
---|---|
Total cores | 10 |
Number of performance-cores | 6 |
Linux Kernel | Linux 5.15.0-89-generic |
Number of efficient-cores | 4 |
Performance-core base frequency | 2.50 GHz |
Efficient-core base frequency | 1.80 GHz |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nie, S.; Liu, Y.; Niu, J.; Wu, W. CAL: Core-Aware Lock for the big.LITTLE Multicore Architecture. Appl. Sci. 2024, 14, 6449. https://doi.org/10.3390/app14156449
Nie S, Liu Y, Niu J, Wu W. CAL: Core-Aware Lock for the big.LITTLE Multicore Architecture. Applied Sciences. 2024; 14(15):6449. https://doi.org/10.3390/app14156449
Chicago/Turabian StyleNie, Shiqiang, Yingming Liu, Jie Niu, and Weiguo Wu. 2024. "CAL: Core-Aware Lock for the big.LITTLE Multicore Architecture" Applied Sciences 14, no. 15: 6449. https://doi.org/10.3390/app14156449
APA StyleNie, S., Liu, Y., Niu, J., & Wu, W. (2024). CAL: Core-Aware Lock for the big.LITTLE Multicore Architecture. Applied Sciences, 14(15), 6449. https://doi.org/10.3390/app14156449