Fine-Grained Management for Microservice Applications with Lazy Configuration Distribution
Abstract
:1. Introduction
- Management mechanism: We observe the configuration redundancy problem in service meshes and propose the corresponding config on-demand idea for memory efficiency. We design the dependency-aware traffic management (DATM) mechanism, combining monitors and a controller. The mechanism is application-agnostic, non-intrusive, and does not require any source or business code changes.
- Dependency-aware traffic algorithm: We analyze the characteristics of microservices and research the configuration and dependencies of microservice applications. We propose the algorithm of service dependencies extraction and implement the controller of the control-plane traffic in the form of plugins. The configuration can be distributed on demand.
- Evaluation: We extensively evaluate DATM using a comprehensive benchmark. The experimental results demonstrate that the DATM mechanism significantly reduces storage resource usage of a single agent by 40% to 60% and greatly reduces the number of cluster updates. Additionally, the DATM mechanism improves the efficiency of issuing configurations, resulting in reduced configuration time. Furthermore, from the perspective of the entire cluster, the optimization results are even more impressive.
2. Problem Analysis
2.1. Service Description
- –
- indicates whether the service is up;
- –
- is used to identify whether the service is a business service or not;
- –
- indicates which namespace the service belongs to;
- –
- is the all related destinations of this service. For the service, we have
2.2. Problem Scenario
- (1)
- Redundant Memory Footprint. Pilot cannot provide accurate data based on specific requests due to the unpredictable dependencies between services. Instead, it provides available data based on existing clusters. When a change is detected, the Pilot sends a discovery response (full configuration) to Envoy to update the configuration, increasing memory overhead. The memory footprint of Envoy and its configuration endpoints is proportional to the amount of memory they consume, which means that the more endpoints there are, the more memory is consumed. This increase in memory usage presents a significant challenge. The delivery of a complete configuration in the control plane increases memory usage, which can easily lead to out of memory (OOM) errors.
- (2)
- Frequent Configuration Updates. In large clusters with frequent updates and many instances, any update triggers the delivery of the full configuration to all Envoys. This scenario is depicted in Figure 2a. The ideal update frequency of services is shown in Figure 2b,c. When an edge service is updated and has no dependencies, it only updates itself and does not trigger others. Other services can be added to record all the cluster’s updates for consolidation. On the other hand, when a service with dependencies is updated, we need to trigger the related service while the rest of the services are unaffected.
- (3)
- Configuration Time Increase. In the Pilot push process, full configuration will lead to a longer control plane push time and convergence time. This is not conducive to a normal invocation of services and simultaneously pulls down Pilot’s overall performance and efficiency.
2.3. Basic Platform
2.4. Memory Footprint
2.5. Configuration Time
3. DATM Mechanism
- (1)
- DATM obtains runtime metrics information (including the name and status of the invoking service, etc.) when a service request arrives in the data plane or when historical invocation information exists. It does so using the , which is marked as ➀ in Figure 3. The monitor collects telemetry data from each microservice instance and stores them in a centralized database for processing. It is described in Section 3.1. These runtime metrics information is used to guide control extractor in calculating and determining the optimal runtime configuration.
- (2)
- stands for global service (marked as ➁ and described in Section 3.2). It is a normal Envoy proxy and carries the full configuration information in the mesh. GL exists to guarantee access when routing information is missing. When a new request arrives, it triggers GL access to obtain the corresponding route if the corresponding proxy does not have it.
- (3)
- The control extractor detects alarm messages from the monitor. It queries the information through the collected runtime data to (a) extract the (marked as ➂ and described in Section 3.3); (b) extract the DEM (marked as ➃ and described in Section 3.3). Using the telemetry data collected in ➀ and the identified in ➂, DATM makes a mitigation decision to reconfigure the sidecar dependencies for the . The policy used to make this decision is generated using a control loop. The control loop makes a mitigation policy by comparing the existing configuration list information with the new change information.
- (4)
- Finally, through Istio’s native deployment module (marked ➄ and described in Section 3.4), specific configuration information is distributed in the form of xDS that are understandable to data-plane agents. Actions are verified and executed on the underlying Kubernetes cluster.
3.1. Monitor Coordinator
3.2. Information Acquisition
3.3. Control Extractor
Algorithm 1: Hub Service Extraction |
Algorithm 2: Service Dependencies Extraction |
3.4. Control Traffic Delivery
4. Methodology
- This app models a category in an online bookstore displaying book information. Four business services: Productpage, Details, Reviews, and Ratings. The web page shows a book description, details about the book (ISBN, number of pages, etc.), and some reviews about the book.
- Productpage and Details each have a version, and Reviews have three versions. These services have no dependency on Istio but constitute a representative example of a service mesh. It consists of multiple services, languages, and versions of the reviews service.
- A global service contains the configuration information for all services, e.g., service discovery information, network routing rules, and network security rules.
5. Evaluation
- Loading configuration by default modes (Default).
- A control strategy developed using scaffolding (Lazyload).
- Our proposed controller (DATM).
5.1. Memory Footprint Evaluation
5.2. Number of Updates Evaluation
5.3. Time of Configuration Distribution Evaluation
6. Related Works
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
API | Application Programming Interface |
CDS | Cluster Discovery Service |
CPU | Central Processing Unit |
CRD | Custom Resource Definition |
DATM | Dependency-Aware Traffic Management mechanism |
DEM | Dependencies of Microservices |
DR | DestinationRule |
EDS | Endpoint Discovery Service |
FaaS | Function-as-a-Service |
GKE | Google Kubernetes |
GL | Global Service |
GW | Gateway |
HB | Hub Service |
HTTP | Hypertext Transfer Protocol |
MSA | Microservices Architectures |
QoS | Quality of Service |
SDK | Software Development Kit |
SOA | Service-Oriented Architecture |
TSDB | Time Series Database |
VR | VirtualService |
xDS | x Discovery Service |
YAML | YAML Ai not a Markup Language |
References
- Gan, Y.; Delimitrou, C. The architectural implications of cloud microservices. IEEE Comput. Archit. Lett. 2018, 17, 155–158. [Google Scholar] [CrossRef]
- Wang, S.; Guo, Y.; Zhang, N.; Yang, P.; Zhou, A.; Shen, X. Delay-Aware Microservice Coordination in Mobile Edge Computing: A Reinforcement Learning Approach. IEEE Trans. Mob. Comput. 2021, 20, 939–951. [Google Scholar] [CrossRef]
- Sprott, D.; Wilkes, L. Understanding Service-Oriented Architecture. Archit. J. 2004, 1, 10–17. [Google Scholar]
- Merkel, D. Docker: Lightweight linux containers for consistent development and deployment. Linux J. 2014, 239, 2. [Google Scholar]
- Kubernetes. Available online: https://kubernetes.io (accessed on 3 March 2023).
- Rejiba, Z.; Chamanara, J. Custom Scheduling in Kubernetes: A Survey on Common Problems and Solution Approaches. ACM Comput. Surv. 2022, 55, 151. [Google Scholar] [CrossRef]
- Lewis, J.; Fowler, M. Microservices. Library Catalog. 2014. Available online: https://martinfowler.com/ (accessed on 3 March 2023).
- Li, W.; Lemieux, Y.; Gao, J.; Zhao, Z.; Han, Y. Service Mesh: Challenges, State of the Art, and Future Research Opportunities. In Proceedings of the 2019 IEEE International Conference on Service-Oriented System Engineering (SOSE), San Francisco, CA, USA, 4–9 April 2019; pp. 122–1225. [Google Scholar] [CrossRef]
- Wang, J.; Cao, J.; Wang, S.; Yao, Z.; Li, W. IRDA: Incremental Reinforcement Learning for Dynamic Resource Allocation. IEEE Trans. Big Data 2022, 8, 770–783. [Google Scholar] [CrossRef]
- Costa, B.G.; Bachiega, J., Jr.; de Carvalho, L.R.; Araújo, A.P.F. Orchestration in Fog Computing: A Comprehensive Survey. ACM Comput. Surv. 2023, 55, 29. [Google Scholar] [CrossRef]
- Istio—Connect, Secure, Control, and Observe Services. Available online: https://istio.io/ (accessed on 1 March 2023).
- Song, J.; Guo, X.; Ma, R. Istio Handbook—Advanced Practice of Istio Service Mesh; Electronic Industry Press: Beijing, China, 2020; Volume 8, pp. 10–18. [Google Scholar]
- Envoyproxy. Envoy—Adaptive Concurrency Filter. 2023. Available online: https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/adaptive_concurrency_filter (accessed on 17 March 2023).
- Github—Bookinfo Sample. 2022. Available online: https://github.com/istio/istio/tree/master/samples/bookinfo (accessed on 10 March 2022).
- GitHub—Isotope. Available online: https://github.com/istio/tools/tree/master/perf/load (accessed on 23 December 2022).
- Leva, A.; Maggio, M. The PI+p controller structure and its tuning. J. Process. Control 2009, 19, 1451–1457. [Google Scholar] [CrossRef]
- Christudas, B. Practical Microservices Architectural Patterns: Event-Based Java Microservices with Spring Boot and Spring Cloud; Apress: Pune, India, 2019. [Google Scholar]
- Delavergne, M.; Cherrueau, R.; Lebre, A. A Service Mesh for Collaboration Between Geo-Distributed Services: The Replication Case. In Proceedings of the Agile Processes in Software Engineering and Extreme Programming—Workshops—XP 2021 Workshops, Virtual Event, 14–18 June 2021; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar] [CrossRef]
- Aldea, C.L.; Bocu, R.; Vasilescu, A. Relevant Cybersecurity Aspects of IoT Microservices Architectures Deployed over Next-Generation Mobile Networks. Sensors 2022, 23, 189. [Google Scholar] [CrossRef] [PubMed]
- Kuznetsov, A.V. Protein transport in the connecting cilium of a photoreceptor cell: Modeling the effects of bidirectional protein transitions between the diffusion-driven and motor-driven kinetic states. Comput. Biol. Med. 2013, 47, 758–764. [Google Scholar] [CrossRef] [PubMed]
- Yang, Y.; Zhao, L.; Li, Y.; Zhang, H.; Li, J.; Zhao, M.; Chen, X.; Li, K. INFless: A Native Serverless System for Low-Latency, High-Throughput Inference. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, New York, NY, USA, 28 February 2022–4 March 2022; pp. 768–781. [Google Scholar] [CrossRef]
- Zhou, Z.; Zhang, Y.; Delimitrou, C. AQUATOPE: QoS-and-Uncertainty-Aware Resource Management for Multi-Stage Serverless Workflows. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, New York, NY, USA, 25–29 March 2022; Volume 1, pp. 1–14. [Google Scholar] [CrossRef]
- Shahrad, M.; Fonseca, R.; Goiri, I.; Chaudhry, G.; Batum, P.; Cooke, J.; Laureano, E.; Tresness, C.; Russinovich, M.; Bianchini, R. Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC 20), Virtual, 15–17 July 2020; pp. 205–218. [Google Scholar]
- Alencar, D.; Both, C.; Antunes, R.; Oliveira, H.; Cerqueira, E.; Rosário, D. Dynamic Microservice Allocation for Virtual Reality Distribution With QoE Support. IEEE Trans. Netw. Serv. Manag. 2022, 19, 729–740. [Google Scholar] [CrossRef]
- Wu, H.; Alay, Ö.; Brunstrom, A.; Ferlin, S.; Caso, G. Peekaboo: Learning-Based Multipath Scheduling for Dynamic Heterogeneous Environments. IEEE J. Sel. Areas Commun. 2020, 38, 2295–2310. [Google Scholar] [CrossRef]
- Shah, S.Y.; Dang, X.H.; Zerfos, P. Root Cause Detection using Dynamic Dependency Graphs from Time Series Data. In Proceedings of the 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December 2018; pp. 1998–2003. [Google Scholar] [CrossRef]
- Jennings, B.; Stadler, R. Resource Management in Clouds: Survey and Research Challenges. J. Netw. Syst. Manag. 2015, 23, 567–619. [Google Scholar] [CrossRef]
- Bao, L.; Wu, C.Q.; Bu, X.; Ren, N.; Shen, M. Performance Modeling and Workflow Scheduling of Microservice-Based Applications in Clouds. IEEE Trans. Parallel Distrib. Syst. 2019, 30, 2101–2116. [Google Scholar] [CrossRef]
- Suresh, L.; Bodík, P.; Menache, I.; Canini, M.; Ciucu, F. Distributed Resource Management across Process Boundaries. In Proceedings of the 2017 Symposium on Cloud Computing, SoCC 2017, Santa Clara, CA, USA, 24–27 September 2017; pp. 611–623. [Google Scholar] [CrossRef] [Green Version]
- Bhattacharya, R. Smart Proxying for Microservices. In Proceedings of the 20th International Middleware Conference Doctoral Symposium, Davis, CA, USA, 9–13 December 2019; pp. 31–33. [Google Scholar] [CrossRef]
- GitHub—Aeraki-Framework/Aeraki. Available online: https://github.com/aeraki-framework/aeraki (accessed on 15 February 2023).
- GitHub—Slime-io/Slime. Available online: https://github.com/slime-io/slime (accessed on 17 March 2023).
- Lin, C.; Mahmoudi, N.; Fan, C.; Khazaei, H. Fine-Grained Performance and Cost Modeling and Optimization for FaaS Applications. IEEE Trans. Parallel Distrib. Syst. 2023, 34, 180–194. [Google Scholar] [CrossRef]
- Li, J.; Zhao, L.; Yang, Y.; Zhan, K.; Li, K. Tetris: Memory-Efficient Serverless Inference through Tensor Sharing. In Proceedings of the 2022 USENIX Annual Technical Conference (USENIX ATC 22), Carlsbad, CA, USA, 11–13 July 2022. [Google Scholar]
- Delimitrou, C.; Kozyrakis, C. Paragon: Qos-Aware Scheduling For Heterogeneous Datacenters. Comput. Archit. News 2013, 41, 77–88. [Google Scholar] [CrossRef] [Green Version]
- Xie, X.; Govardhan, S.S. A Service Mesh-Based Load Balancing and Task Scheduling System for Deep Learning Applications. In Proceedings of the 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), Melbourne, VIC, Australia, 11–14 May 2020; pp. 843–849. [Google Scholar] [CrossRef]
- Saleh Sedghpour, M.R.; Klein, C.; Tordsson, J. An Empirical Study of Service Mesh Traffic Management Policies for Microservices. In Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering, Beijing, China, 9–13 April 2022; pp. 17–27. [Google Scholar] [CrossRef]
- Auriol, J.; Boussaada, I.; Shor, R.J.; Mounier, H.; Niculescu, S. Comparing Advanced Control Strategies to Eliminate Stick-Slip Oscillations in Drillstrings. IEEE Access 2022, 10, 10949–10969. [Google Scholar] [CrossRef]
- Xu, M.; Buyya, R. Brownout Approach for Adaptive Management of Resources and Applications in Cloud Computing Systems: A Taxonomy and Future Directions. ACM Comput. Surv. 2019, 52, 1–27. [Google Scholar] [CrossRef]
- Rusek, M.; Landmesser, J. Time Complexity of an Distributed Algorithm for Load Balancing of Microservice-oriented Applications in the Cloud. ITM Web Conf. 2018, 21, 18. [Google Scholar] [CrossRef] [Green Version]
Notation | Definition |
---|---|
a set of microservices; svc is a service | |
the number of instances of microservice | |
instance m of service i | |
status of service j | |
whether the service is running | |
whether the service is a business service | |
namespaces to which the service belongs | |
related destinations of this service | |
L | number of load |
C | size of configuration |
U | storage space occupation |
configuration distribution time | |
agent receive time | |
push queue time | |
network latency | |
performance enhancement | |
memory footprint for a single service | |
M | microservice execution history metric |
running business services list | |
priority queue which records all applications in the cluster | |
services which have the calling relationship | |
all relationship between receiving services |
Filter Properties | Other Properties | ||||
---|---|---|---|---|---|
... | ... | ... | ... | ... | ... |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, N.; Wang, L.; Li, X.; Qin, X. Fine-Grained Management for Microservice Applications with Lazy Configuration Distribution. Electronics 2023, 12, 3404. https://doi.org/10.3390/electronics12163404
Wang N, Wang L, Li X, Qin X. Fine-Grained Management for Microservice Applications with Lazy Configuration Distribution. Electronics. 2023; 12(16):3404. https://doi.org/10.3390/electronics12163404
Chicago/Turabian StyleWang, Ning, Lin Wang, Xin Li, and Xiaolin Qin. 2023. "Fine-Grained Management for Microservice Applications with Lazy Configuration Distribution" Electronics 12, no. 16: 3404. https://doi.org/10.3390/electronics12163404
APA StyleWang, N., Wang, L., Li, X., & Qin, X. (2023). Fine-Grained Management for Microservice Applications with Lazy Configuration Distribution. Electronics, 12(16), 3404. https://doi.org/10.3390/electronics12163404