2. Data Storing and Processing
The definition of business data processing covers all possible operations performed on data. The most common are the modification, transmission, analysis, management, and collection of data. Transformation is the strategic goal of performing data operations raw data into meaningful information that helps improve the current one the situation in the company or solve an existing business problem. The output processing often takes many forms, such as reports, diagrams, and graphics, making the data easier to understand and analyze [
1,
2].
Data processing is the process responsible for the conversion content and input through systematic execution operations to obtain the output in a predetermined form [
4,
12]. The following data processing methods are distinguished [
4]: manual data processing, mechanical data processing, and electronic data processing. Electronic data processing can be further divided due into processing techniques:
Batch Processing—gathers data into groups, which later remain processed sequentially in the correct order. The processing of subsequent groups data does not require human intervention. The processes start automatically one after the second. The first process starts at a predefined time, most often when processing systems are as little loaded as possible with other tasks. Thanks to this, it is possible to perform operations on many records in a fairly short time. The processing results are later saved to the database in the format, enabling their further use (displaying data in the system, analysis, creating documents, and reports).
Real-Time Processing—the processing that is used to perform operations on real-time data. It is used when the processing results must be displayed in the shortest possible time. Data provided to the software are used immediately.
Multiprocessing—this type of processing provides the basis of all processor-based computing devices. It consists of the task or sets of operations being shared among the processors working together. This type of processing reduces the overall execution time all operations and increases the efficiency of the process. In addition, the available processors operate independently of each other. This means that the failure of one processor does not cause stopping the entire process as other processors continue to run.
Time-Sharing Processing—this is the processing in which a single processor is used by many users. Performing the given operations takes place in different time intervals for different users as allocated by the processor sometimes. Switching occurs so frequently that any user can expect a response very quickly.
The data processing process is cyclical. The data processing cycle is a series of steps performed to extract the relevant information. Changes on the market, new customers, and changes in the enterprise lead to the need to repeat the data collection stage, which results in the next data processing cycle. The processing cycle shows how data change their form, facilitating its further interpretation and allowing the final results to be used in effective business decisions [
1].
3. Related Work on Cloud Computing Data Storing and Processing
Cloud computing is a model that allows convenient on-demand access to a common pool configurable computing resources (e.g., for networks, servers, memory mass, applications, and services) that can be quickly made available and released with minimal effort on the part of the service provider [
7,
8,
10,
13,
14,
15].
Cloud computing providers and the services they offer can be compared on several different levels. The offers of service providers differ significantly depending on the model and type of cloud. There are vendors that specialize in providing infrastructure and some that focus on delivering professional systems and software. Some try to meet the needs of each cloud computing model.
Cloud computing providers and the services they offer can compare on several different levels. This market is dominated by cloud computing giants such as AWS (Amazon Web Services), Microsoft Azure, Google Cloud Platform, IBM Cloud, or Salesforce.com.
The cloud computing study [
7,
8] shows that more and more organizations are using the cloud at a certain level, along with the growing popularity of public and private clouds. Most companies combine different cloud types at the same time. There are many cloud providers, but the market leaders are still AWS, Amazon, and Google Cloud. The cost of cloud adaptation is the biggest challenge for small- and medium-sized enterprises. It is therefore important to consider for what purposes investing in the cloud is necessary. Often, in order to optimize costs, it is recommended to use the advice of specialists who are able to indicate the appropriate adaptation tactics for a given company. Another important issue is data security in the cloud. Usually, it is not certain where the data are exactly and what people have access to them. Contrary to appearances, resources stored with an external supplier are often more secure than those stored inside a given enterprise.
However, there are also many works on computing data storing and processing in the literature.
The authors of [
16] propose the mobile cloud computing model to use the open-source codes from distributed computing frameworks, such as Hadoop. It is defined to improve the efficiency of business processing. They also study how to process and analyze the unstructured data in parallel to this model and also verify if customized information for individuals may be provided using unstructured data.
Muniswamaiah et al. present the review of opportunities and challenges of transforming big data using cloud computing resources [
17]. They show how big data are used in decision making process to gain useful outcomes from the data for business and engineering [
17]. Presenting the challenges of processing, they evaluate if cloud computing is helpful in advancement of big data by providing computational, networking, and storage capacity.
Kaplancali and Akyol analyze the performance evaluation of small- and mid-sized enterprises (SMEs) form the point of view of cloud computing usage in their activities. The authors conducted the quantitative study on the set of 112 respondents employed in Turkish SMEs. The performance specific scales is used for research model, and the obtained results of signified cloud technology have the positive impact on business performance.
The authors of [
18] discussed the cloud computing architecture and its numerous services as well as several security issues in cloud computing based on its service layer. Moreover, several open challenges of cloud computing adoption and its future implications were identified together with the presentation of available platforms in the current era for cloud research and development.
De Donno et al. analyze the foundations and evolution of computing paradigms [
19]. They present the evolution of modern computing paradigms and, for each paradigm, show its key points and its relation with the others. The authors address the fog computing and its role as the connector between IoT, cloud computing, and edge computing. Moreover, they identify the open challenges and future research directions for IoT, cloud computing edge computing, and fog computing.
Alkasem et al. present the proposal of a new methodology to construct the performance optimizing model for optimizing the real-time monitoring of the big datasets [
20]. This model includes a machine learning algorithms and Apache Spark Streaming to realize the fine-grained fault diagnosis and repair of big dataset. The authors studied the case of use of the failure of virtual machines (VMs) to start up. The proposed methodology ensures that the most sensible action is realized during the procedure of fine-grained monitoring and generates the efficacy and cost-saving fault repair. This process is performed by three control steps: data collection, analysis engine, and decision engine.
The authors of [
15] present the recent contributions and results in the fields of cloud computing, IoT, and big data technologies and applications. They specify different concepts of cloud computing technologies from chosen points of view for industrial and medical applications.
Forestiero et al. [
21] propose the hierarchical approach for workload management in distributed data centers. Its aim is to preserve the autonomy of single data centers and allow the integrated management of heterogeneous platforms. The described solution is rather generic but, according to the authors, answers the specific requirements of single environments—it is shown by the analysis of performance of specific cloud infrastructure composed of four data centers.
Moreover, the edge cloud computing technology is related to data processing in the enterprises. The authors of [
22] describe the mobile edge computing as a new technology that enables the innovative service scenarios to ensure the optimized network operation and new business opportunities. Mobile edge computing opens up services for the consumers and enterprise and also adjacent industries to deliver critical applications and data over the mobile network. Such solution can support the new value chain and cases across multiple sectors.
Lee et al. [
23] in the framework of fog computing concept propose the mobile personal multiaccess edge computing (MEC) architecture that utilizes users mobile device as MEC server (MECS) to allow the mobile users to receive the continuous service delivery. The results shown in the work by the form of proposed scheme reduce the average service delay and provides efficient task offloading compared to the other existing MEC schemes.
The work of Wang et al. [
24] presents the edge-based auditing method that is addressed to the data security in the framework of internet of things resources. It proposes the audit model that is based on binary tree assisted by edge computing to provide the computing capability for resource-constrained devices. The data preprocessing task by offloading to the edge gives the possibility to reduce the computing load and improves the processing efficiency.
The authors of [
25] describe the concept of multiaccess edge computing (MEC) that gives the computing power and storage resources to the edge of mobile network. In this way, it allows the mobile user device to run the real-time applications. The proposed MEC-based mobility management scheme to arrange MEC server (MECS) enables one to receive content and use server resources efficiently even when they move.
More and more companies and organizations are using the cloud with the growing popularity of public and private clouds. There are many cloud providers, but the market leaders are still AWS, Amazon, and Google Cloud. The cost of cloud adaptation is the biggest challenge for small- and medium-sized enterprises. It is therefore important to consider for what purposes investing in the cloud is necessary. Another important issue is data security in the cloud. Usually, it is not certain where the data are exactly and what people have access to it. Contrary to appearances, resources stored with an external supplier are often more secure than those stored inside a given company. More and more companies recognize and appreciate the advantages of cloud computing, which is why we can observe an increase in its use on the market.
5. Analysis of Data Processing Methods in Salesforce Cloud
Each enterprise has its own individual business processes needs and a characteristic way of working. Data processing methods described in the previous section differ in their operation. Any of these methods when used appropriately can contribute to an increase company performance. Proper data processing supports the achievement of company business goals and optimizes the work of its employees.
Data processing in Salesforce.com cloud was performed for operation of type insert, update, and delete. All these operations were performed for the following methods: Batch, Apex, Future Apex, Apex Queueable, Apex Scheduable, and Apex Triggers. The following factors were compared:
total duration of a given transaction;
number of SOQL (Salesforce Object Query Language) queries;
number of DML (Data Manipulation Language) operations;
number of records processed per second, processing time on server (Central Processing Unit Time);
memory used during processing (Heap Memory).
Number of 10,000, 50,000, and 100,000 records were inserted into the database, modified, or deleted. The collected results and conclusions are presented below.
5.5. Comparison and Analysis of the Methods
Table 13,
Table 14 and
Table 15 show the largest collected numbers of processed records in one second for all the abovementioned methods.
For the
insert operation, the best results were obtained during Apex Queueable processing (
Table 13). The slowest data were processed using Future Apex methods. For the Apex Trigger method, it was not possible to collect data on processing 50,000 and 100,000 records.
For the
delete operation, the best results were collected during Apex Queueable processing (
Table 14). Unfortunately, most of the tested methods failed to collect results for processing 50,000 and 100,000 records. This is most commonly caused by mutual locking of records.
For the
update operation, no results could be collected either for processing of 50,000 and 100,000 records for Future Apex, Apex Schedulable, and Apex Trigger methods. The best results of all tested processing methods were collected for Apex Queueable (
Table 15).
Table 16 summarizes the longest and the shortest data processing time on the server for 10,000 and 100,000 records.
The best data processing results were collected for the Apex Queueable method and Apex Batchable method. For 100,000 records, only the two methods were compared, because for the others, it was not possible to collect the test results (
Table 16). Taking into account the results collected during the research, the following conclusions were drawn:
Apex Batchable and Apex Queueable are best at processing of large amounts of data. By using them, we can bypass the limits of force.com data processing. The more transactions are running, the slower the data are processed. Each data processing operation requires additional operations server resources. Consequently, the more data are processed during one transaction, the faster the whole process is completed. This rule works well for all tested methods.
Even Apex Queueable is apparently processing the records faster, it is possible that with a busier server it works much slower. This may happen due to the fact that each queued transaction must wait for the completion of preceding transaction operations. Apex Batchable works slower than Apex Queueable; because of that, records are divided into groups of up to 2000 records. Apex Queueable can process records in groups of 10,000 records. Apex Batchable has no record locking issues. This is because the records to be processed are retrieved from the database only once in the start() method of the Batchable interface. Later, these records remain divided into groups and separated into the correct number of processes. These processes are independent of each other. Additionally, Apex Batchable always performs one SOQL (Salesforce object query language) operation less than the other processes. What happens in the start() method does not count toward the limits transaction processing. Apex Batchable is suitable for doing the same operations on a large number of records and guarantees a quick completion of the whole processing.
Apex Queueable can be useful not only when processing a large group data but also in the processing of transactions that are dependent on each other—when the start of the second transaction depends on whether the previous one was successful. Each operation requires the operation to be completed predecessor, which is guaranteed by the use of Apex processing Queueable along with making queued tasks dependent on each other (Transaction Chain).
Apex Batchable, Apex Queueable, Scheduled Apex, and Future Apex represent asynchronous processing. They run in the background not preventing system users from performing other duties. This type of processing is suitable for long, complex performance operations—the result of which the user may find out later time. These can be operations such as making monthly summaries or generating reports and documents. Apex Schedulable additionally allows developers to set an exact date when the processing takes place to begin. It is also possible to set the repeatability of a given process.
Triggers are not suitable for processing a large number of records. Process they data synchronously, therefore they are treated as a single data transaction. Data Manipulation (DML) operation cannot be performed Language) with more than 10,000 records. Triggers are suitable for tasks such as small data processing and quick return responses to the user. When using the application, users modify forms, save changes to individual records, and delete and add new data. By using triggers, developers can control what happens before or after users execute DML operations.
Future Apex processes data asynchronously but is not suitable for performing the same operation on a large number of records. Running the operation is not addictive; therefore, there is no way around the problem of mutual blocking of transactions. Future Apex can stay used, for example, when implementing the integration of two systems. The code in the method with the @future annotation is executed when the given process allocates processor resources. This method may attempt to communicate with another system, wait for its response, and save it to base. At this time, the rest of the code of the currently processed process is not blocked, and the user can continue to work without waiting for response from an external system.
Each of the tested processing methods has its advantages and disadvantages. It is important to understand well how they work. This allows them to be the appropriate use for the benefit of the enterprise. The incorrect use of methods of data manipulation can contribute to the occurrence of various problems such as delays, irregularities in reports, and the need to introduce costly changes in system implementation.
6. Discussion and Conclusions
The methods of data processing and storage described in the paper support the work of enterprises every day. The choice of how to process and store information is an individual matter and depends largely on the size and type of business of the company. The technological advances in the development of servers, memory, and networks are changing not only the infrastructure of data centers but also the concepts of its architecture. Data centers are undergoing a constant transformation, and the data centers that will begin to emerge in the near future will look completely different from what we know today. This is mainly due to the development directions of companies that must constantly change in order to meet market requirements. Expectations are constantly growing so that IT systems can be flexibly adapted to the current needs and dynamically scaled. One of the most important factors influencing the evolution of data centers is the development of relatively new technologies that are quickly gaining popularity or will be used in mass in the near future.
Cloud computing supports the work of the companies around the world. Due to its numerous advantages, its popularity continues to grow. Companies that start to invest in this type of service must carefully study the variety offers of service providers. The range of possibilities that can be used is huge; therefore, it is important to think carefully about what it really is that the company needs and how much data it would like to store and process with clouds. Data security is also very important. Please note that storing information in your own data center does not always guarantee greater protection. Sometimes, an experienced service provider who has dealt with for years securing their clients’ data would be able to provide and implement more security mechanisms than the company itself is trying to protect its data from attacks by cyber criminals.
The service provider in the Salesforce.com cloud model offers many ways to manipulate data on your platform. In an application created under of this thesis, the results of the insert, update, and operation were collected and collected to delete for the five selected processing methods. After analyzing the results, it was specified for what purposes companies can use these methods:
Processing large amounts of data—the best method for processing large amounts for the number of records is Apex Batchable and Apex Queueable. Both of these methods they can process huge amounts of data without blocking work platform users or other business processes. Apex Queueable, additionally, allows for the dependence of several processes on each other. Every operation will wait for the end of the previous one before starting. Thankfully, this company can be sure that the end result of the processing will be complete and that nothing will be missed.
Performing recurring modifications—Apex Schedulable is best suited or modifications that are constantly performed at given intervals. This method allows one to choose at what time and for how much time the process is run. The platform allows to run many such tasks at the same time. The only thing to remember is that processes do not modify the same records at the same time. The operation in such case will be aborted, and an exception will be generated.
Data processing through integration with external systems—Future Apex methods works well for communication with external system. This processing method runs the process in separate thread so as not to block the user’s work. This is useful, seeing as waiting for a response from another system may often take a while. An additional advantage of this way of processing is reduced platform limits on database query execution and memory usage. Sequential data processing—if the user needs to process data in one thread, so that all instructions are executed sequentially, the user should use trigger processing. For the user who will start the process, the user will not be able to perform any further operations until response from the first process will not be returned. Accordingly, this type operations should not process large groups of records.
Processing a large number of tasks with the ability to monitor them—all asynchronous processes (Apex Batchable, Apex Queueable, Future Apex, and Apex Schedulable) can be monitored using the force.com platform. System user can observe the course of running processing, view its operation status, and read out any errors. Additionally, the information about problems can be read using the development console, which allows one to view system logs.
Data processing methods can be combined. This allows one to take advantage of different solutions at the same time. An example of such a connection can be the use of Apex Schedulable together with Apex Batchable. The company can need to process a large number of records that must be performed periodically. Using Apex Schedulable, the start date and periodicity of the process can be set exactly. The Apex Schedulable process may run an execute() method of a class that implements the Batchable interface. This will make it possible to process a large number of records.