1. Introduction
Measurement technology has evolved into wearable hardware devices, but we also can use observers as parts of software that transmit monitored values of system variables and collect them in databases. Usually, these captured data from production servers are technically difficult or relatively expensive. Furthermore, tests of the production system are not possible or are carried out too late. However, in many cases, it is necessary to test the hardware and software environment already at the early stages of its preparation. Analysis of the Internet system is a complex and time-consuming task and requires appropriate preparation on both sides—the software and the hardware. Logs report information, which are crucial to diagnose the root cause of complex problems. Administrators of most user-facing systems depend on periodic log data to get the status of production applications. Experimental environments make it possible to create an ecosystem for collecting and processing data about a specific environment so it could be monitored, managed, and controlled more easily and efficiently. Therefore log data is an essential and valuable resource for online service systems.
In this article, an experimental environment will be presented, as a case study, based on a container structure. Furthermore, a web application (
) was developed, which their task is to perform the appropriate tasks algorithm. Modern web systems provide multiple services that are deployed through complex technologies. Thus, this approach, on the software and server side, is based on the latest programming technologies and containers running in the cloud. The proposed application (
) is a new concept based on the DayTrader Java EE application [
1] originally developed by IBM as the Trade Performance Benchmark Sample. DayTrader is a benchmark application built around the paradigm of an Online Stock Trading System (OSTS). In order to automate the testing process of the prepared application, an additional application was constructed, which is an automatic client (
), that generates players and performs requests on the OSTS. The combination of these two applications allowes for the preparation of a benchmark that was used to analyze web requests. The process of playing on the OSTS was analyzed with predetermined query scenarios (
,
,
). The OSTS tasks include receiving and then processing purchase and sale offers while measuring the duration of individual operations, conducting transactions, i.e., executing previously created offers, and measuring the CPU and RAM consumption of each container.
Based on this approach, it was possible to analyze the behavior of the OSTS and the requests processing during increased load. Data was obtained in tests for various test parameters on two different hardware architectures and and with a different number of R docker replicas because the benchmark program was placed in a prepared container environment. This solution uses the system architecture where communication is mediated by a message queuing server.
Modern system development and operations rely on monitoring and understanding systems behaviour in a production. Behavior analysis was performed and used for client traffic classification. It was possible to indicate customer traffic based on the system parameters obtained and the application processing. Traditional analysis and prediction methods in cloud computing provide unidimensional output. However, the unidimensional output cannot capture the relationship between multiple dimensions, which results in limited information and inaccurate results. The precise determination of customer behavior is very difficult, but with the use of multidimensional hardware, software factors and the defining of trends of clients’ behavior it has been successful.
The rest of this article is organized as follows. We discuss related work and introduce our previous models in
Section 2.
Section 3 presents our solution based on hardware and software elements. This section contains mechanisms describing the system, the operation of the OSTS, and the game algorithms implemented by the generator. In
Section 4, we evaluate the usefulness of our benchmark for analysis in the domain of the web system. Finally,
Section 5 presents the conclusions and future work.
2. Related Works
In recent years, novel applications have emerged and it benefited from automated log-file analysis, for example, real-time monitoring of system health, understanding user behavior, and extracting domain knowledge. In [
2,
3], we can find a systematic review of recent literature (covering the period between 2000 and June 2021) related to automated log analysis. Application logs record the behavior of a system during its runtime, and their analysis can provide useful information. Log data is used in anomaly detection, root analysis, behavior analysis, and other applications. In this section, we discuss related work on this topic for web system structures. We divided related work into two parts software and hardware. Some articles present several methods to design new and improve existing web systems that, even within unpredictable load variations, have to satisfy performance requirements [
4]. Reliability testing is a significant method to ensure the reliability and quality of systems. The proposition in [
5] taxonomy to organise works focusing on the prediction of failures could help in the context of Web structures performance. This taxonomy classifies related work along the dimensions of the prediction target (e.g., anomaly detection, performance prediction, or failure prediction), the time horizon (e.g., detection or prediction, online or offline application), and the applied modeling type (e.g., time series forecasting, machine learning, or queueing theory). We were able to find works for understanding workloads and modeling their performance is important for optimizing systems and services. The main models were presented in [
6,
7] uses Petri Nets. Article [
8] presents a method for setting the input parameters of a production system. In [
9], authors try to understand and model storage workload performance. They analyzed over 250 traces across 5 different workload families using 20 widely used distributions. Publications [
10,
11] use stochastic formalisms for the performance engineering of a web system and compare their own models with the performance of the production system. We based on strategies and techniques that could be used in practice to derive the values of common metrics, including event-driven, tracing, sampling, and indirect measurement proposed in [
12]. Furthermore, some of them could be applied generally to other types of metrics.
The load generators [
13,
14] that define web workloads imitate the behavior of thousands of concurrent users in a web browser. Existing generators mostly use different distributions for representing the time between requests (client think time) [
15].
We found many old benchmarks, but they are all based on old types of system. WebTP [
16] is a benchmark that measures the performance of a web information subsystem. In [
17], we can find old techniques of performance testing and various diagnostic tools to implement testing. In recent years, several new tools and methodologies have been used to evaluate and measure the quality of web systems. For example, we could find [
18] checking the conformance with respect to the requirements (compatibility testing). In this context, one challenge for analysis is how to execute multiple test cases, in a correct and efficient way, that may cover several environments and functionalities of the tested applications while reducing the consumed resources and time. The existing approaches suffer from several limitations when deploying them in practice [
19]: inability to deal with various logs and complex log abnormal patterns, poor interpretability, and lack of domain knowledge. Logfile anomaly detection is vital for service reliability engineering. Paper [
19] proposes a generic log anomaly detection system based on ensemble learning. They conduced an empirical study and an experimental study based on large-scale real-world data. In [
20], the authors conducted a comprehensive study on log analysis in Microsoft. This article uncovers the real needs of industrial practitioners and the unnoticed, yet the significant gap between industry and academia. Debnath [
21] presents a real-time log analysis system that automates the process of detecting anomalies in logs. This system runs at the core of a commercial log analysis solution that handles millions of logs generated from large-scale industrial environments. In [
22], the authors offered a method for conceptualizing and developing a real-time log acquisition, analysis, visualization, and correlation setup for tracking and identifying the main security events.
New technologies have not only offered new opportunities but also have posed challenges to hardware and software reliability technology. In [
23], the technologies of software reliability testing were analyzed, including reliability modeling, test case generation, reliability evaluation, testing criteria, and testing methods. Proposed in [
24] framework can predict the resource utilization of physical machines. This framework consists of two parts: a noise reduction algorithm and a neural network. Davila-Nicanor in paper [
25], presents a process to estimate test case prioritization on Web systems. The results become a guide to establish test coverage through the knowledge of the most critical paths and components of the system. The newest and the most common proposed hardware behavior predictions are based on machine learning techniques [
26,
27]. In [
26], proposes a novel Prediction mOdel based on SequentIal paTtern mINinG (POSITING) that considers the correlation between different resources and extracts behavioral patterns. Based on the extracted patterns and the recent behavior of the application, the future demand for resources is predicted. Reliability, availability, and maintainability aspects are critical for an engineering design and were investigated in [
28]. These aspects concern a system’s sustained capability throughout its useful life. The authors in [
29] provided a methodology that results in the successful integration of Reliability, Availability, and Maintainability with Model-Based Systems Engineering that can be used during the early phases of design.
Paper [
30] analyzes the time-sharing system and the network connection, by exploring internal computer processors. Some works [
31] are related to the detection of errors in the process of static software analysis. Said et al. [
32] presented a straggler identification model for distributed environments using machine learning. This model uses several parameters extracted by the execution of various types and large-scale jobs. In the paper [
33], the authors presented the study of workload prediction in the cloud environment.
In many cases, traffic analysis and classification of web system requests only include models of native architecture. All use experiments to verify the proposed classifications. However, we were unable to find an approach based on a container architecture. Furthermore, we could not find an approach applicable to up-to-date software web framework tools. Some authors present tools for the run-time verification of quantitative specifications applications. The PSTMonitor from [
34] is the detection of executions that deviate from the expected probabilistic behavior. CaT [
35] is a nonintrusive content-aware tracking and analysis framework. CaT can improve the analysis of distributed systems. The paper [
36] presents LogFlow, a tool to help human operators in the analysis of logs by automatically constructing graphs of correlations between log entries. The core of LogFlow is an interpretable predictive model based on a Recurrent Neural Network.
4. Analysis of Requests Traffic
One of the ways to guarantee high-quality applications is through testing. Testing is an important aspect of every software and hardware development process that companies rely on to elevate all their products to a standardized set of reliable software applications while ensuring that all client specifications are met. Due to the lack of traces from the real container system, instead we use an application benchmark for the prepared scenario. All analyses require a set of input parameters. The concrete set of input parameters differs depending on the underlying system architecture and particular test.
This section describes the test setup used to obtain the measurement traces. The benchmark allows us to use measurements for performance parameters. A set of experiments was conducted and the results were analyzed. We collected observations of the arrival times, execution times of individual requests, and average CPU utilization during each experiment run. The benchmark execution times were measured.
The tests have been grouped into 4 characteristic groups examining different characteristics:
group examining the impact of the number of replicas ().
group examining the impact of time between transaction execution ().
group examining the impact of time between player requests ().
group investigating the impact of scenarious used by players (, , ).
This study also has taken into account the impact of the physical factor, i.e., the performance of the hardware itself on which the tested application will be launched. The tests were carried out on two different servers with different architectures. The first architecture has 8 processors and 20 GB of RAM (), while the second has 12 processors and 30 GB of memory ().
4.1. Game Strategies
Each generated user (player) has its specific class and according to that it takes actions on the OSTS. Before the test, parameters are given that determine how many players should be launched for a given algorithm. The results obtained depend on the values of the given parameters.
The first algorithm is “Buy and sell until resources are used”, hereinafter referred to as
. Its operation diagram is shown in the figure (
Figure 3). The second algorithm is “Buy and sell alternately”, hereinafter referred to as
. The scheme of its operation is shown in the figure (
Figure 4). The third algorithm is “Just browse”, hereinafter referred to as
. Its diagram is shown in the figure (
Figure 5). The difference between the
and
algorithms is that the
algorithm does not run out of resources, i.e., it adds one buy then one sell, while the
algorithm adds a buy offer in a loop until the player’s resources run out, then adds sales offers until the player’s resources run out as well. The
algorithm does not affect the expansion of data in the database of the
application because these are read-only operations.
4.2. Tests Configuration
Each of the 18 tests (with 4 traces per test) presented in the table (
Table 2) was also performed for different time ranges: 1 h, 3 h, 6 h, 9 h, and 12 h (360 logs per architecture):
—is characterized by a change in the number of replicas,
—is characterized by a different times between transactions,
—is characterized by a different times between requests,
—is characterized by a different number of players realising a given algorithm (strategy).
After the set time limit had elapsed, it was possible to download the data collected by the benchmark:
logs of queries made by the player, e.g., issuing an offer,
logs of consumption parameters of replicas—CPU and RAM memory,
logs regarding the number of issued purchase/sale offers,
CPU and RAM memory usage logs for .
4.3. Experiment Results
In this subsection, we present the results obtained from the benchmark tests. In many production situations, direct measurement of resource demands is not feasible. Benchmark testing is a normal part of the application development life cycle and is performed on a system to determine performance. Finally, we describe the conducted experiments in detail and present the results obtained from them. During the experiments, we monitored each container as well as every request. Benchmark tests are based on repeatable environments. We carried out experiments in realistic environments to obtain measurement traces.
4.3.1. Impact of the Number of Replicas of the Stock Exchange Application on Performance
The OSTS is scalable; i.e., it allows the determination of the number of replicas of the application to increase the efficiency and responsiveness of the entire system. The desired feature of scalability in the OSTS it has been implemented with the use of containerization software and the Swarm mode built into it, which implements this mechanism. As you can easily guess, increasing the number of replicas of the OSTS should allow the server to better use the available resources and the operation of the application itself. In the case of the
architecture, the difference in processor utilization after increasing the number of replicas from 2 to 6 increases by a maximum of a few percent, in the
test group (
Figure 6), which allows for a slightly larger number of requests during a given time limit
T, e.g., offers to buy and sell shares (
Figure 7 and
Figure 8).
4.3.2. Impact of Architecture on the Stock Exchange Application Operation
The obvious fact is that the operation of the system in terms of its performance depends on the platform on which it is running. Carrying out system load tests in a test environment allows us to answer the question of how the application will behave, e.g., in the case of a very high load, and whether the tested architecture has sufficient resources to handle all incoming requests. This is a very important consideration when working on more sensitive systems that need to be available 24/7 with downtime kept to a bare minimum. Both tested hardware architectures,
and
, meet the requirements of all test scenarios, i.e., they allow for their trouble-free completion within the desired time limit, and there are no complications related to the lack of hardware resources. As a consequence, the generated logs are complete. The mixed
test (
) simulates more similar real conditions (each player performs different actions and works according to the different scheme) to some extent represents the approximate load of the real system. According to the chart (
Figure 9), we could decide which architecture solution we are going to use. By choosing the
solution, we will meet the demand for server resources from clients, but for more future-proffing, this solution may not be enough, as the popularity of the service increases (peak load is a maximum of 75%). An alternative is the
architecture, which will provide a greater reserve of computing power (peak load lower by 15%), and thanks to the application scalability mechanism, it is possible to change the server’s hardware configuration. Similarly, in the case of RAM memory, by monitoring current consumption, we could determine whether its level is sufficient (
Figure 10).
4.3.3. Impact of the Type of User Requests on the Stock Exchange Application Load
The characteristics of the system load depend primarily on the type of requests processed by the system-actions performed at the moment by the players. As you can see in the figure (
Figure 11), the time course experiment data series of most tests is sinusoidal when players use the strategies
and
(buying and selling stocks). This is because an algorithm has been implemented in the OSTS that handles transactions every
interval, due to which there is a break between generated requests. Only in the scenario of 200 concurrent
players, where users only view offers (
strategy), the load is relatively constant and has lower load-amplitude fluctuations in the graph. In summary, the algorithm plays a key role in the load on the OSTS and stresses the system the most when it performs its task. The transaction algorithm is discussed in
Section 3.4.
4.3.4. The Influence of the Transaction Algorithm on the Operation of the Stock Exchange Application
The transaction algorithm executes transactions, i.e., it combines buy and sell offers based on the queuing mechanism (First In First Out) and checks a number of conditions that have to be met for the exchange of shares to take place. An important parameter of the OSTS is to specify the time interval () in which successive batches of buy/sell offers will be processed using this algorithm. Setting it at the right time can have a positive effect on the processing of offers and has the correspondingly benefitial effect on load-balancing of the server. Time periods of 1 to 5 min delay between processing have been tested.
For the
architecture (
Figure 12), the most appropriate delay between transactions
was the time lower than 60 [s]. Execution of requests on a regular basis results in lower consumption of server resources resulting in the lack of a long queue of buy/sell offers (
Figure 13). This results in a smaller number of processed offers within
(
Figure 14), leading to less impact on server resources (
Figure 12).
Considering the second
architecture of the above-mentioned relationships, we also have observed the same behavior. It is interesting that for
= 60 [s],
generates a linear load, except for tests with a higher delay (
Figure 15). This phenomenon was also observed in longer tests. To sum up, the appropriate setting of the
delay parameter for the transaction algorithm positively affects the operation of the OSTS; however, the correct operation of the system should be verified by examining its time logs. The too-low and too-high values of these delay cause problems with the functioning of the application, which has been checked.
4.3.5. Time between Requests
The last relationship discovered in the log analysis process was based on the
scenario group analyzing the impact of the
parameter (time between player queries) on the final logs of the OSTS. The smaller the time interval (think time), the more the system is loaded with player requests and, therefore, requires more resources. The think time adds delay in between requests from generated clients [
4]. Of course, it is unrealistic for each player to perform various actions on the website in such short time intervals, but the simulation clearly shows the high impact of this parameter on the test results (
Figure 16). In addition, another dependence was verified, that could be read from the charts—the CPU consumption is the same on each container.
5. Conclusions
In the era of intelligent systems, the performance, reliability, and safety of systems have received significant attention. Predicting all aspects of the system during the design phase allows developers to avoid potential design problems, which could otherwise result in reconstructing an entire system when discovered at later stages of the system or software development life cycle. To ensure the desired level of reliability, software engineering provides a plethora of methods, techniques, and tools for measuring, modeling, and evaluating the properties of systems.
In this article, a novel design concept is presented as a case study of a container-based web system in the cloud. The aim of doing so is to demonstrate the key changes in systems design activities to address reliability and related performance. It is worth noting that web system modeling is helpful in the context of both correlated efficiency growth and behavior recognition.
We designed and implemented a tool for an expanded analysis based on performance parameters from logs. We conducted a long-term study of the Online Stock Trading System (OSTS). We applied approaches for analysis using the system logs of the benchmark for different workload scenarios and different hardware/software parameters. Along with measurements, we presented some conclusions about the characteristics of requests. Lastly, we evaluated these values in relation to the suitability for recognizing the request stream.
A benchmark was prepared based on a container structure running in the cloud, which consists of elements such as exchange replicas, traffic generator, queuing server, OSTS database, and a traffic generator database. The task of the generator was to run a test consisting of simulating a certain number of players of the selected class, that then sends queries to the OSTS via the queuing server. During this test, data on query processing time, CPU, and RAM usage were collected for each container. The next step was to analyze the obtained data. We identified the following obvious benefits: CPU usage is the same on every replica of the exchange, the more requests in the queue, the longer the processing time, algorithm generates constant CPU usage, CPU usage is much lower on architecture, architecture processes more queries than the architecture, with a short break between requests, the architecture is unable to send more requests quickly, and a short transaction time keeps request-processing times low.
By examining logs, it is possible to gain valuable insights into how the application is functioning and the manner in which it is being used by users. This information can be examined to identify any issues that need to be addressed, as well as to optimize the performance of the application. Logs can be also used to identify elements of the application that can be further optimized. Additionally, log analysis can be useful for identifying trends in the use of the application, allowing for a deeper understanding of user preferences. It remains the choice of system developers to determine how detailed the logs generated by the system will be. Consequently, this can greatly enrich the results of ongoing research and provide valuable information.
The main contribution of this paper is the discussion of the issues (e.g., business models) involved in creating benchmark specifications for up-to-date web systems. The presented results show that there are many possible links between web requests or web traffic and the production system. The proposed benchmark can help by providing guidelines for the construction of a container-based cloud-based web production system. A holistic view of the research effort on logging practices and automated log analysis is key to providing directions and disseminating the state of the art for technology transfer.
This work also highlights several possible future research directions. In future work, the limitations to expanding the presented results should be discussed. The main research topic should center on the use of many request classes in one scenario, which will bring the model closer to reality. Another step could be to check the players’ behavior in the second test scenario and its influence on the response time. Building a performance model with high generalization and providing more interpretable reconstruction results for the data-driven model are important tasks for our future research. Finally, we consider proposing a method of discovering anomalies in web systems and application logs based on user behavior.