A Fast Cold-Start Solution: Container Space Reuse Based on Resource Isolation
Abstract
:1. Introduction
- First, isolation of functions needs to be implemented in a sandbox that ensures performance and functional security. Typically, cloud providers use containers or virtualization technologies to provide a reliable runtime environment for different tenants, such as the current mainstream sandbox technologies Docker [5], gVisor [6], Kata [7], Firecracker [8], etc.;
- Second, it is necessary to provide various middleware that can support function customization, communication data collection, and operation status monitoring to achieve loose coupling of functions with the underlying platform [9]. In addition, to expedite the initialization of instances to start, functions are usually wrapped with a one-to-one or one-for-all prewarm approach [10];
- Then, the availability and stability of user applications need to be ensured by dynamic container scheduling, not only to avoid resource contention but also to perform the recovery of idle resources. For example, the literature [11,12,13] has implemented resource scheduling optimization from different levels;
- Unlike the mainstream open source serverless platform, which classifies the status of function containers as “running” and “idle”, we use Prometheus to monitor function containers by extracting key parameters including memory usage, CPU usage, and QPS. Based on these parameters, we add a function container status of “low usage”, which aims to obtain more reusable resources from the idle and low usage container pools without consuming additional system resources.
- We innovatively design a function wrapper with host privileges based on the nesting technique (namespace) and resource restriction technique (cgroups), which can “steal” the remaining resources from idle containers and low usage containers, and we transform these resources into a temporary container space in the form of a child process to meet the new function resource usage requirements in a safe and isolated manner.
- In our designed cold-start solution, when a new function needs to establish an initialized runtime environment, we use the nearest similarity rule of the CPU and memory resources to dynamically match suitable container resources from the idle container pool or the low usage container pool, and the runtime only needs to prepare the dependencies of the new function itself, which can effectively achieve a fast response to a cold start.
2. Related Work
- Virtual resource isolation
- 2.
- Container prewarm startup
- 3.
- Container resource elasticity management
3. Container Pool Resource Classification
3.1. Idle Container Identification
3.2. Low Usage Container Identification
4. Function Wrapper
4.1. Wrapper Design
4.2. Container Resource Reuse
- As shown in (a), function A is recognized as an idle function container. The processor will end the subprocess of function A and start a new namespace subprocess in the new directory of the container. The size of the new function subprocess and that of the container’s resources (i.e., requests, limits) is the same, and all dependent packages for the functions to be helped are installed. When a new request for function B arrives, the processor receives a cold-start request from function B forwarded by the scheduling service. Then, the processor sends a request to the internal function subprocess, which downloads the code package for function B. After the code for function B is ready, the processor forwards the request to function B, and the cold-start of function B ends. If the request for function A arrives again, the idle container of function A checks whether the to-be-helped function (such as function B) is being stolen. If a function is being stolen, because function A has the highest priority permission, the idle container of function A refuses to receive new requests. The processor removes the code and log information of the helped function B and then downloads the function A code package to process the request for function A. After the new function A image is prepared, the request for function A is redirected to the new function A image instance, and the idle container life cycle of function A ends.
- As shown in (b), function A is recognized as a low usage function container. The processor starts a new namespace subprocess in the new directory of the container. The resource size of the new function subprocess is consistent with the remaining available resources of the container, and all dependent packages for the functions to be helped are installed. When the processor processes the cold-start request of function B, if the image of function B is not ready, the helped function B continues to provide services. Once the image of function B is ready, the processor receives a request to clean up the helped function B to provide services for other to-be-helped functions.
5. Dynamically Matching Reusable Container Resources
5.1. Scheduler Service
5.2. Relieve Cold-Start
- The idle container: If there are new requests for function C during the reuse process, we prioritize ensuring the service requests of function C. Function G no longer receives new requests and immediately releases the current resource after processing the current request. In addition, when the function G image does not start successfully, the scheduler service attempts again to search for reusable container resources in the container pool. If there are still no suitable container resources to match, or the function G image does not start, function G waits for its image to start successfully.
- The low usage container: If the QPS of function D suddenly increases during the process of reusing function D, and function H is also processing requests, it causes an increase in resource usage. In extreme cases, function D satisfies this request through the container scale-up. After the processing of function H is completed, the reused function D resources are released, and this process only briefly affects the container scale-up of function D.
5.3. Container Resource Selection
6. Experiment
6.1. Benchmark Configuration
6.2. Comparison Relationship of Function Containers
6.3. The Impact of Idle Function Threshold
6.4. Comparison Test of Cold-Start Latency
6.5. Testing of Resource Reuse Solutions
- The idle function container testing: In the initial stage, the function container stops requesting calls after a period of frequent calls. Then, when Alphidae discovers that the resources of the function container have been reused by other functions, it initiates a new request for the original function in the container. What we need to do is observe whether there is a significant change in the cold-start latency of the original function or even if the request for the function fails.
- The low usage function container testing: We use a call every 20 s to reduce the frequency of function container usage. When Alphidae discovers that the resources in the container have been reused by other functions, it begins to increase the QPS of the existing functions in the container. We also observe whether there is a significant change in the cold-start latency of the original function in the container.
7. Discussion
- We found through container monitoring that resource management inevitably leads to resource waste in two situations: one is that the reusable resources in the container are relatively small and may not be able to meet new function reuse requests, and these remaining resources continue to exist until the container dies; and the other is that reusable resources have been provided for new functions, there are still some redundant resources in the container, but these resources are relatively small and can no longer be reused, so the accumulated fragmentation resources are wasted as the number increases. For a node, the resources that it can provide are limited. If resource waste cannot be resolved, the system may fail to respond to services due to insufficient resources. In our design scheme, the remaining resources in the container are reused for a short period of time only to alleviate the cold start. After the reused function image is successfully started, the reused function traffic is forwarded to the newly established function image container, and the reused function is released again. This series of system operation processes can be completed in a few hundred milliseconds or seconds. In view of the above two kinds of resource waste, our next research work will focus on fragmentation resource management and provide cold-start alleviation services for more new function calls.
- When setting the idle time threshold, we consider the situation of a general function container; for example, we set the threshold to 30 s in our research. In theory, due to the different types of functions, the setting of time thresholds should also be different. If there are few applied functions in the actual environment, we can manually set the time threshold. However, when there are multiple types of functions, this approach significantly increases labor costs. Moreover, if there is a conflict between this static configuration and the actual operation of the function, it still needs to be resolved manually, which entails a large workload and a time-consuming process. Therefore, our next research plan is to dynamically set the time threshold based on the actual running status of the function, which can quickly and accurately identify the idle function container.
- When there are no container resources in the container pool that meet the function’s needs, the function will be forced to perform a cold start. To alleviate the cold-start problem in this extreme situation, we will explore solutions for automatic scaling of the prewarm container pool based on the situation of an idle function container pool, a low usage function container pool, and waiting for start function. Of course, prewarming the container pool may require additional resources. We will combine the first part of the discussion to convert the remaining unused resources into new container resources waiting for reuse. We will do so to find a good balance between resource utilization and cold start. If this approach still does not provide sufficient resources for the function, we may consider using vertical scaling to solve the problem.
8. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Google. Google Cloud Functions. 2020. Available online: https://cloud.google.com/functions (accessed on 18 December 2022).
- IBM. IBM Cloud Functions. 2020. Available online: https://www.ibm.com/cloud/functions (accessed on 10 December 2022).
- Sahil Malik. Azure Functions. 2020. Available online: https://azure.microsoft.com/en-us/services/functions (accessed on 22 December 2022).
- Amazon Web Services. AWS Lambda. 2020. Available online: https://aws.amazon.com/lambda (accessed on 7 January 2023).
- Docker. Home Page. 2021. Available online: https://www.docker.com (accessed on 7 January 2023).
- GitHub. Google Container Runtime Sandbox. 2021. Available online: https://github.com/google/gvisor (accessed on 10 January 2023).
- Kata Containers. Home Page. 2021. Available online: https://katacontainers.io (accessed on 20 January 2023).
- Agache, A.; Brooker, M.; Iordache, A.; Liguori, A.; Neugebauer, R.; Piwonka, P.; Popa, D.M. Firecracker: Lightweight virtualization for serverless applications. In Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI’20), Santa Clara, CA, USA, 25–27 February 2020; pp. 419–434. [Google Scholar]
- Battula, S.K.; Garg, S.; Montgomery, J.; Kang, B. An Efficient Resource Monitoring Service for Fog Computing Environments. IEEE Trans. Serv. Comput. 2020, 13, 709–722. [Google Scholar] [CrossRef]
- Mohan, A.; Sane, H.; Doshi, K.A.; Edupuganti, S. Agile cold starts for scalable serverless. In Proceedings of the 11th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud’19), Renton, WA, USA, 8 July 2019; Available online: https://www.usenix.org/conference/hotcloud19/presentation/mohan (accessed on 22 January 2023).
- Naha, R.K.; Garg, S.; Battula, S.K.; Amin, M.B.; Georgakopoulos, D. Multiple Linear Regression-Based Energy-Aware Resource Allocation in the Fog Computing Environment. Comput. Netw. 2022, 216, 109240. [Google Scholar] [CrossRef]
- Battula, S.K.; Naha, R.K.; Kc, U.; Hameed, K.; Garg, S.; Amin, M.B. Mobility-Based Resource Allocation and Provisioning in Fog and Edge Computing Paradigms: Review, Challenges, and Future Directions. Mob. Edge Comput. 2021. [Google Scholar] [CrossRef]
- Mahmoudi, N.; Lin, C.Y.; Khazaei, H.; Litoiu, M. Optimizing serverless computing: Introducing an adaptive function placement algorithm. In Proceedings of the 29th Annual International Conference on Computer Science and Software Engineering (CASCON’19), New York, NY, USA, 4–6 November 2019; pp. 203–213. [Google Scholar]
- Mcgrath, G.; Brenner, P.R. Serverless computing: Design, implementation, and performance. In Proceedings of the 37th IEEE International Conference on Distributed Computing Systems Workshops (ICDCS Work shops’17), Atlanta, GA, USA, 5–8 June 2017; pp. 405–410. [Google Scholar]
- Lee, H.; Satyam, K.; Fox, G.C. Evaluation of production serverless computing environments. In Proceedings of the 11th IEEE International Conference on Cloud Computing (CLOUD’18), San Francisco, CA, USA, 2–7 July 2018; pp. 442–450. [Google Scholar]
- Amazon. Enabling API Caching to Enhance Responsiveness. Available online: https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-caching.html (accessed on 12 January 2023).
- Jenkins. DevOps CI Tool. Available online: https://www.jenkins.io (accessed on 21 January 2023).
- Barlev, S.; Basil, Z.; Kohanim, S.; Peleg, R.; Regev, S.; Shulman-Peleg, A. Secure yet usable: Protecting servers and Linux containers. IBM J. Res. Dev. 2016, 60, 12:1–12:10. [Google Scholar] [CrossRef]
- Ye, K.; Wu, Z.; Wang, C.; Zhou, B.B.; Si, W.; Jiang, X.; Zomaya, A.Y. Profiling-based workload consolidation and migration in virtualized data centers. IEEE Trans. Parallel Distrib. Syst. 2015, 26, 878–890. [Google Scholar] [CrossRef]
- Hall, A.; Ramachandran, U. Opportunities for Optimizing the Container Runtime. In Proceedings of the 2022 IEEE/ACM 7th Symposium on Edge Computing (SEC), Seattle, WA, USA, 5–8 December 2022; pp. 265–276. [Google Scholar]
- Oakes, E.; Yang, L.; Zhou, D.; Houck, K.; Harter, T.; Arpaci-Dusseau, A.; Arpaci-Dusseau, R. SOCK: Rapid task provisioning with serverless-optimized containers. In Proceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC’18), Boston, MA, USA, 11–13 July 2018; pp. 57–70. [Google Scholar]
- Thalheim, J.; Bhatotia, P.; Fonseca, P.; Kasikci, B. Cntr: Lightweight OS containers. In Proceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC’18), Boston, MA, USA, 11–13 July 2018; pp. 199–212. [Google Scholar]
- Microsoft. Isolation Modes. Available online: https://docs.microsoft.com/en-us/virtualization/windowscontainers/manage-containers/hyperv-container (accessed on 24 February 2023).
- Madhavapeddy, A.; Mortier, R.; Rotsos, C.; Scott, D.; Singh, B.; Gazagnaire, T.; Smith, S.; Hand, S.; Crowcroft, J. Unikernels: Library operating systems for the cloud. In Proceedings of the Architectural Support for Programming Languages and Operating Systems (ASPLOS’13), New York, NY, USA, 16–20 March 2013; pp. 461–472. [Google Scholar]
- Microsoft. Azure Functions Premium Plan. Available online: https://docs.microsoft.com/en-us/azure/azure-functions/functions-premium-plan (accessed on 28 February 2023).
- Xu, Z.; Zhang, H.; Geng, X.; Wu, Q.; Ma, H. Adaptive function launching acceleration in serverless computing platforms. In Proceedings of the 25th IEEE International Conference on Parallel and Distributed Systems (ICPADS’19), Los Alamitos, CA, USA, 4–6 December 2019; pp. 9–16. [Google Scholar]
- Anwar, A.; Mohamed, M.; Tarasov, V.; Littley, M.; Rupprecht, L. Improving Docker registry design based on production workload analysis. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST’18), Oakland, CA, USA, 12–15 February 2019; pp. 265–278. [Google Scholar]
- Shahrad, M.; Fonseca, R.; Goiri, I.; Chaudhry, G. Serverless in the wild: Characterizing and optimizing the serverless workload at a large cloud provider. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC’20), 15–17 July 2020; pp. 205–218. [Google Scholar]
- Harter, T.; Salmon, B.; Liu, R.; Arpaci-Dusseau, A.C.; Arpaci-Dusseau, R.H. Slacker: Fast distribution with lazy Docker containers. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16), USENIX Association, Santa Clara, CA, USA, 27 February –2 March 2017; pp. 181–195. [Google Scholar]
- Bentaleb, O.; Belloum, A.S.Z.; Sebaa, A.; El-Maouhab, A. Containerization technologies: Taxonomies, applications and challenges. J. Supercomput. 2022, 78, 1144–1181. [Google Scholar] [CrossRef]
- Chang, C.C.; Yang, S.R.; Yeh, E.H.; Lin, P.; Jeng, G.Y. A Kubernetes-based monitoring platform for dynamic cloud resource provisioning. In Proceedings of the 2017 IEEE Global Communications Conference (GLOBECOM’17), Los Alamitos, CA, USA, 4–8 December 2017; pp. 1–6. [Google Scholar]
- Viil, J.; Srirama, S.N. Framework for automated partitioning and execution of scientific workflows in the cloud. J. Supercomput. 2018, 74, 2656–2683. [Google Scholar] [CrossRef]
- Chen, L.H.; Shen, H.Y. Considering resource demand misalignments to reduce resource over provisioning in cloud datacenters. In Proceedings of the 2017 IEEE Conference on Computer Communications (INFOCOM’17), Los Alamitos, CA, USA, 1–4 May 2017; pp. 1–9. [Google Scholar]
- Ling, W.; Ma, L.; Tian, C.; Hu, Z. Pigeon: A dynamic and efficient serverless and FaaS framework for private cloud. In Proceedings of the 2019 International Conference on Computational Science and Computational Intelligence (CSCI’19), Las Vegas, NV, USA, 5–7 December 2019; pp. 1416–1421. [Google Scholar]
- Kaffes, K.; Yadwadkar, N.J.; Kozyrakis, C. Centralized core-granular scheduling for server less functions. In Proceedings of the ACM Symposium on Cloud Computing (SoCC’19), New York, NY, USA, 21–23 November 2019; pp. 158–164. [Google Scholar]
- Guan, X.J.; Wan, X.L.; Choi, B.Y.; Song, S.; Zhu, J.F. Application oriented dynamic resource allocation for data centers using Docker containers. IEEE Commun. Lett. 2017, 21, 504–507. [Google Scholar] [CrossRef]
- Daw, N.; Bellur, U.; Kulkarni, P. Xanadu: Mitigating cascading cold starts in serverless function chain deployments. In Proceedings of the 21st International Middleware Conference (Middleware’20), New York, NY, USA, 7–11 December 2020; pp. 356–370. [Google Scholar]
- Baldini, I.; Cheng, P.; Fink, S.J.; Mitchell, N. The serverless trilemma: Function composition for serverless computing. In Proceedings of the 2017 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, Onward! Vancouver, BC, Canada, 22–27 October 2017; pp. 89–103. [Google Scholar]
Virtualization | Startup Latency (ms) | Isolation Power | OSkernel | Hotplug | OCI Supported |
---|---|---|---|---|---|
Traditional VM | >1000 | Strong | Unsharing | No | Yes |
Docker | 50–500 | Weak | Host-sharing | Yes | Yes |
SOCK | 10–50 | Weak | Host-sharing | Yes | Yes |
Kata | 100–500 | Strong | Unsharing | Yes | Yes |
Hyper-V | >1000 | Strong | Unsharing | Yes | Yes |
gVisor | 100–500 | Strong | Unsharing | No | Yes |
FireCracker | 100–500 | Strong | Unsharing | No | Yes |
Unikernel | 10–50 | Strong | Built-in | No | No |
Option | Configuration |
---|---|
Node | CPU: Intel(R) Xeon(R) Gold 6226R CPU @ 2.90 GHz Cores: 8, DRAM: 16 G, Disk: 100 GB SSD |
Software | Operating system: Linux version 4.15.0, Docker: 20.10.13 Runc version: 1.0.3, Containered version: 1.5.10 |
Container | Container runtime: Python-3.10.0, Linux with kernel 4.15.0 Function container limit: 20 for each function on each node Prewarm pool size in OpenWhisk: 2 on each node |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, B.; Zhan, Y.; Ren, S. A Fast Cold-Start Solution: Container Space Reuse Based on Resource Isolation. Electronics 2023, 12, 2515. https://doi.org/10.3390/electronics12112515
Li B, Zhan Y, Ren S. A Fast Cold-Start Solution: Container Space Reuse Based on Resource Isolation. Electronics. 2023; 12(11):2515. https://doi.org/10.3390/electronics12112515
Chicago/Turabian StyleLi, Bin, Yuzhuo Zhan, and Shenghan Ren. 2023. "A Fast Cold-Start Solution: Container Space Reuse Based on Resource Isolation" Electronics 12, no. 11: 2515. https://doi.org/10.3390/electronics12112515
APA StyleLi, B., Zhan, Y., & Ren, S. (2023). A Fast Cold-Start Solution: Container Space Reuse Based on Resource Isolation. Electronics, 12(11), 2515. https://doi.org/10.3390/electronics12112515