SHIYF: A Secured and High-Integrity YARN Framework
Abstract
:1. Introduction
- Service identity forging. Since there is not the service certification, any malicious node can masquerade as a security node and join Hadoop cluster to get/calculate data as long as it knows the ResourceManager (RM) address.
- User identity forging. Because there is not the user authentication, any malicious client can fake the user identity to get Hadoop Distributed File System (HDFS) data or do job management.
- Lack of authorization mechanism. A client can do anything, such as a job submitted by user A can be killed by user B at will.
- Data communications are not encrypted. They are vulnerable to eavesdropping.
- Speculative execution is used for Hadoop YARN security.
- Some significant security improvements are made to Hadoop 2.0 in SHIYF, such as ensuring the correctness of MRv2 results and locating the malicious nodes and the potential ones in the Hadoop cluster.
- A prototype of SHIYF is implemented based on Hadoop 2.8.0.
- Results of theoretical derivations show that SHIYF adds 30% speculative tasks in the MRv2 job and achieves a malicious node detection ratio of more than 90%.
- Experiment results show that SHIYF can ensure the security of MRv2 services while increasing overhead slightly. Moreover, the malicious node detection ratio is between 87% and 93.3%.
- This finding is in line with the expectation of theoretical derivation.
2. SHIYF Design and Implementation
2.1. SHIYF Design
- Job creation, initialization, startup, and so on.
- Apply to RM for resources and reallocate resources.
- Container startup and release.
- Monitoring the operation status of the job.
- Job recovery.
2.2. SHIYF Implementation
2.2.1. SHIYF ContainerAllocator
2.2.2. SHIYF Speculator
- Whether the current task had already a backup task. Every task could had two speculative tasks and a maximum of three.
- The ratio of completed tasks was not less than MINIMUM_COMPLETE_PROPORTION_TO_SPECULATE (5%). Only then could the Speculator had sufficient historical task information to estimate estimatedReplacementEndTime.
- DefaultSpeculator could launch speculative execution in a certain probability without calculating the speculationValue.
- MINIMUM_ALLOWED_SPECULATIVE_TASKS = 10. It represents the minimum number of total speculative tasks that are allowed for a job.
- PROPORTION_TOTAL_TASKS_SPECULATABLE = 0.35. It denotes the highest percentage of speculative tasks to the total tasks is 35%.
- PROPORTION_RUNNING_TASKS_SPECULATABLE = 0.3. It indicates the highest percentage of speculative tasks to all running tasks is 30%.
- MINIMUM_ALLOWED_SPECULATIVE_TASKS
- PROPORTION_TOTAL_TASKS_SPECULATABLE * totalTaskNumber
- PROPORTION_RUNNING_TASKS_SPECULATABLE * numberRunningTasks
2.2.3. SHIYF Security Control
- In Job, the hostnames of nodes that failed to execute tasks were recorded and written to HDFS logs. If failure occurred more than five times, then SHIYF would consider these nodes the malicious nodes.
- If two TaskAttempts disposed of the same data but returned the different hashes, then Task launched the other speculative TaskAttempt to verify the result again. The node returned the wrong hash once, it would be recorded as the potential malicious node. If the hash comparison failed twice, then Task returned “JOB_TASK_UNCOMPLETED” to Job and restarted. Moreover, the three nodes in this task would all be considered the potential malicious nodes.
- TaskAttempt with speculative execution should compute the MD5 hash of the result and transmit it to Task; however, the normal one does not do that. They are highlighted in red in Figure 7.
2.2.4. SHIYF State Management
- To check some task results, SHIYF need TaskAttempts and their speculative executions to run in parallel until they completed and returned MD5 hashes. Therefore, Task in SHIYF should be allowed two or three speculative Attempts retained at the same time, namely, Task will not kill other corresponding Attempts when it receives “T_ATTEMPT_COMMIT_PENDING” recording the Attempt running.
- When Task received “T_ADD_SPEC_ATTEMPT,” it created a new speculative Attempt to run the same task. All the tasks were chosen for checking, and their speculative executions were added the sign “Extra_SETask” as the determined criteria of launching MD5 computation in TaskAttempt.
- When a TaskAttempt runs successfully, Task in YARN will receive “T_ATTEMPT_SUCCEEDED” and kill other Attempts. However, SHIYF needed to compare the MD5 hashes of the two same TaskAttempts to ensure the validity of the results. Therefore, even if an Attempt has been completed and the MD5 hash has been returned, Task still should wait for the other speculative TaskAttempts until the end. Thus, the other several relevant improvements had been occurred as follows.
- An event “T_ATTEMPT_MD5_COMPARE” was added in “RUNNING.” This event triggered MD5 hash comparison.
- If the first comparison failed, but the second or the third comparisons succeeded, Task would add a “SUCCEED_FALSE” to mark the Attempt being executed successfully but returning a wrong MD5 hash once. At the same time, Task recorded the hostnames of these TaskAttempt machines as evidences of the potential malicious nodes.
- “TA_ATTEMPT_SUCCEEDED,” “T_ADD_SPEC_ATTEMPT,” and “T_ATTEMPT_COMMIT_PENDING” in “SUCCEEDED” must be changed accordingly to control and trigger the state transition.
3. Theoretical Derivation
3.1. Theoretical Arithmetic
3.2. Theoretical Results
- The detection ratio Dratio increased with the increase in the execution ratio Er, the number of jobs t, and the malicious action probability P.
- The number of blocks b had a minimal impact on Dratio.
- As long as the number of jobs t was equal or greater than 25, we could set Er at a low level (≤30%) to achieve a desired Dratio (≥85%) when P ≥ 0.2. Moreover, the more P was, the better Dratio was.
- Furthermore, if we combined map speculative tasks and reduce speculative tasks together, then we could reasonably believe Dratio would be more than 90%.
4. SHIYF Experiments
- The file replication number of HDFS (dfs.replication) was set at 2, because the experiments were executed in a local rack. The minimum size of each file chunk was set at 256 MB to facilitate the processing of large files. To avoid a large number of data copies from the remote machines, the size of the split was set to equal the size of the block. A task disposes of a split.
- Given that six NM machines were equipped with one quad-core CPU, the value of “mapred.tasktracker.tasks.maximum” was set to 4. The number of reductions equaled 1.75 × (the numbers of NMs × mapred.tasktracker.tasks.maximum), namely, 42. Then, the faster NMs that finished their first round of reduce tasks would launch the second round of reduces immediately, thereby indicating a much improved load balancing.
- In Hadoop, the speculative execution was open by default.
- In SHIYF, the 30% Map and Reduce tasks were selected randomly to check the validity of results; thus, they will execute the speculative tasks and MD5 hash computations.
- In SHIYF, two NMs will execute the malicious behaviors and return the wrong MD5 hashes at 20% probability, which is equivalent to the 7%–33.3% malicious nodes in the Hadoop cluster.
4.1. WordCount Benchmark
4.1.1. Execution Results of SHIYF
- In the original YARN framework, although the input paths of “60 × 1 G” are 60 times that of “60 G,” the time cost increases slightly along with the increase of the total input paths when the numbers of blocks are equal to 240 according to Formula (11).
- Without the malicious nodes, the time cost of WordCount increases by approximately 9% only in the SHIYF. A new speculative TaskAttempt is not equal to a new same task; therefore, the Job time does not increase by 30%. The extra time costs mainly come from the communication of the speculative TaskAttempts. By contrast, MD5 hash computing and comparing have little influence on SHIYF.
- When two malicious NMs are given in SHIYF, the probability of Map/Reduce tasks assigned to them is close to 33.3% because of the load balancing of the Hadoop cluster. Furthermore, the probability of malicious behaviors is 20%. Therefore, the increasing time is mainly due to Task waiting for the returned values of extra speculative TaskAttempts. The increasing time cost of SHIYF is between 16% and 20% compared with that of the original condition.
4.1.2. Malicious Node Detection Ratio of SHIYF
- “Hadoop2” and “hadoop5” are the malicious nodes; “hadoop1” and “hadoop3” are the potential malicious ones.
- The malicious node detection ratio of SHIYF is between 87% and 93.3%. This ratio is in line with the expected theoretical derivation shown in Figure 9 in Section 3. Therefore, “hadoop2”/“hadoop5” are not the malicious NMs; they executed the malicious behaviors in their container tasks only once, and this instance was not chosen as among the verified malicious behaviors.
- On the basis of the conclusions of theoretical derivations in Section 3, the detection ratio increases with the increase of the execution ratio Er, the number of jobs t, and the malicious action probability P. Consequently, we believe that SHIYF will have the better malicious node detection ratio when it runs on a larger cluster and test data set.
4.1.3. Resource Utilization of SHIYF
ResourceManager
- The addition of 30% extra speculative executions and executing MD5 hash computations and comparisons have a weak influence on RM. Figure 12a shows that the CPU utilization of RM in the WordCount experiment is relatively low except for the initial stage.
- Adding 30% speculative tasks and 33.3% malicious tasks merely increases a few status monitors to NMs and information communications between RM and NMs; memory utilization remains lower than 36%. Moreover, the memory utilization of RM is markedly smooth, as shown in Figure 12b.
- Several reference variables are recorded to show the disk influence of SHIYF on RM, including the number of transfers per second “tps,” sectors read/written per second “rd_sec/wr_sec,” the average size (in sectors) of the requests that were issued to the device “avgrq-sz,” the average queue length of the requests that were issued to the device “avgqu-sz,” and so on. We take the most representative parameter “wr_sec/s” as an example. Figure 12c shows that adding 30% speculative tasks and MD5 comparisons has a weak influence on the disk throughput of RM. The primary influences are found in the initial and final phases because more statuses of NMs are transmitted to RM; thus, SHIYF evidently increases the hard disk writing of RM.
- Total number of packets received per second “rxpck/s,” total number of packets sent per second “txpck/s,” and data size received per second “rxkB/s,” among others, are recorded for monitoring the network throughput. Taking “rxpck/s” as an example, Figure 12d shows that adding 30% speculative tasks and 33.3% malicious nodes has a minimal influence on the network throughput of RM. Only repeated computing and comparison of MD5 hashes in SHIYF increase some resource applications and NM status reports.
NodeManager: NM(MRAppMaster)
- The CPU utilization of NM (MRAppMaster) is shown in Figure 13a. In SHIYF, the lowest CPU occupancy is more than 80%; moreover, the time consumption of Job is longer than that in the original YARN. However, their increases are under 20%, and a lower CPU utilization will occur if SHIYF is built on more powerful clusters.
- In three conditions the memory utilization of NM (MRAppMaster) is only slightly different, as shown in Figure 13b.
- Some reference parameters are recorded, such as “tps (The number of I/O per second from the physical disk),” “rd_sec/wr_sec (The number of sectors read/write from the device per second),” “avgqu-sz (Average IO request queue length waiting to be processed),” “util% (what percentage of a second is devoted to I/O operations)” etc.for manifesting the influence of SHIYF on the disk of NM (MRAppMaster). The number of sectors read from the device per second (rd_sec/s) is the most representative one. Nevertheless, the average disk reading speeds are close in three conditions, as shown in Figure 13c. The slight increase occurred because NM (MRAppMaster) launched the extra speculative tasks to compute and compare MD5 hashes.
- The total number of packets transmitted per second (txpck/s) indicates the influence of SHIYF on the network throughput of NM (MRAppMaster), as shown in Figure 13d. SHIYF increases some network communications of NM (MRAppMaster) with RM and other NMs, while adding 30% speculative tasks and 33.3% malicious nodes, because NM (MRAppMaster) must report more node statuses to RM and communicate with more containers. However, the extra overhead is affordable.
NodeManager: NM (Containers)
- Figure 14c shows the number of sectors read from the device per second (rd_sec/s) in NM (Containers). Compared with Figure 13c, the disk throughput of NM (Containers) peaks earlier than that of NM (MRAppMaster), and the average throughput is higher. This situation shows that the machine on which MRAppMaster is run allocated fewer containers for dynamic load balancing in the Hadoop cluster. However, the effect of SHIYF is weak in three conditions.
- A comparison of Figure 13d and Figure 14d shows that NM (Containers) also needs to report more node statuses to RM and communicate more with NM (MRAppMaster) in SHIYF. By contrast, the resource consumption of NM (Containers) is lower than that of NM (MRAppMaster). Moreover, their overhead is affordable.
- SHIYF can locate the malicious nodes and the potential malicious ones. The malicious node detection ratio is between 87% and 93.3%. It is in line with the expected theoretical derivation.
- The increasing time cost of SHIYF is between 16% and 20%. Moreover, it has little effect on increasing the resource overhead.
- The limited computing ability of the experiment hardware may increase the time cost and resource consumption. We trust that SHIYF will perform better as it executes a much larger range of jobs in a more powerful Hadoop cluster. If so, SHIYF can use a lower speculative execution ratio to achieve high malicious node detection ratios.
4.2. TestDFSIO Benchmark
4.2.1. Execution Results of SHIYF
- In the original condition, we test the performance of the read-and-write file system of YARN without any modification.
- In the SHIYF (30% duplicate) condition, we abolish the MD5 comparison of SHIYF temporarily because MD5 is a simple and efficient digital digest algorithm, and no malicious nodes occur in this condition. Moreover, SHIYF has little impact on the total job execution time, as verified in Section 4.1.
- In the SHIYF (33.3% malicious) condition, every task chosen for checking needs to compute and compare the MD5 hashes of the intermediate or final results. Moreover, every result of TaskAttempt is different. Thus, we keep only the “tasks” and “size” as the Map/Reduce results to ensure that the MD5 hashes of the same TaskAttempts that ran in the secure NMs are equal.
- In the three conditions, the read speed is much faster than the write speed. In the beginning, with the increase in file size, the running time decreased, thereby indicating that HDFS was highly suitable for processing large-scale reading and writing data. However, a corresponding increase in average running time occurred along with the increase in file sizes because of the inevitable increase in the number of cluster nodes, the complicated hardware configurations, and other reasons.
- Increasing speculative executions by 30% corresponds to an increased 30% TaskAttempts. However, the speculative executions are launched with the original TaskAttempts simultaneously; thus, the time consumption increase is under 8%, as shown in Table 3. This finding is mainly because of the inconsistency of the TaskAttempt completion times, regardless of the “WRITE” test or in the “READ” test.
- In theory, adding 33.3% malicious nodes executed malicious behaviors at 20% probability is equivalent to an increase of approximately 6.66% extra speculative executions. The time increase occurred primarily because of the waiting time for the second speculative execution. Moreover, the increase in the total time cost does not exceed by 16%, unlike with the original YARN.
4.2.2. Influence of SHIYF to Network Throughput
- More files written on HDFS correspond to more copied files transmitted on the network. Therefore, the network throughput is higher.
- The highest network throughput is 64.591 M/s, thereby indicating that the highest growth rate of network throughput is 28.28%. However, this value is far below the bandwidth of a gigabit network (128 M/s). Thus, no significant bandwidth load occurs. Instead, the process improves network bandwidth utilization.
4.2.3. Influence of SHIYF to HDFS
- More speculative executions correspond to more data read from or written to HDFS. However, the changes in the curves in the three conditions were minimal; moreover, they interlaced and partially overlapped.
- Although SHIYF improves the use and efficiency of the disk, it does not increase the hard disk load. The minimal change follows the ideal states in three conditions.
- SHIYF impacts the time of TestDFSIO. However, it has no effect on the read and write performance of HDFS.
4.3. MRBench Benchmark
4.3.1. Execution Results of SHIYF
- inputLines = 1000. The number of every generated file is 1000 lines.
- maps = 200. 200 maps are used for each run.
- reduces = 100. The number of reduces for each run is 100.
- numRuns = 10, 15, 20, 25, 30, and 40.
- In these three conditions, every experiment with the same configuration is executed with different repetition times. The execution time in Figure 16 is an average value of SHIYF that conducted the same job several times. More repetition times correspond to increased accuracy of the execution time.
- Adding 30% speculative executions makes the MRBench time increase by approximately 9%, mainly due to the inconsistent completion time of TaskAttempts. Moreover, this process increases MD5 hash computations and comparisons.
- In the 33.3% malicious nodes condition, execution time increases by approximately 16% because of the extra speculative TaskAttempts and the inconformity of two comparative MD5 hashes.
4.3.2. Malicious Node Location of SHIYF
- Any malicious action record about hadoop1 is found in Job logs. Thus, it is a secure NM.
- The average value of hadoop3/hadoop6 records is between 0 and 1, mainly because two continuous failed MD5 hash verification records would be recorded in 25 experiments. Therefore, they might be the potential malicious NMs. Although they were the secure NMs, they were considered the potential malicious ones if they validated the result as the malicious NMs at the same time and the results were inconsistent.
- The average value of hadoop2/hadoop5 malicious behaviors is 9. We can judge them as the malicious NMs in the Hadoop cluster. Therefore, the malicious node detection ratio of SHIYF is at least 90% in the MRBench benchmark.
- Not only can SHIYF achieve a high malicious node detection ratio, but it can also locate the malicious nodes and the potential ones accurately.
5. Conclusions and Future Work
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Appendix A
YARN | Yet Another Resource Negotiator |
MRv2 | MapReduce 2.0 |
HDFS | Hadoop Distributed File System |
SHIYF | secure and high-integrity YARN framework |
CC | cloud computing |
RM | ResourceManager |
NM | NodeManager |
FSM | finite-state machine |
CA | ContainerAllocator |
CPU | Central Processing Unit |
I/O | Input/Output |
References
- Hayes, B. Cloud computing. Commun. ACM 2008, 51, 9–11. [Google Scholar] [CrossRef]
- Dempsey, D.; Kelliher, F. Cloud Computing. 2018. Available online: https://link.springer.com/chapter/10.1007/978-3-319-63994-9_2 (accessed on 17 April 2019).
- Hashizume, K.; Rosado, D.G.; Fernández-Medina, E.; Fernandez, E.B. An analysis of security issues for cloud computing. J. Internet Serv. Appl. 2013, 4, 5. [Google Scholar] [CrossRef]
- Duncan, A.; Creese, S.; Goldsmith, M. An overview of insider attacks in cloud computing. Concurr. Comput. Pract. Exp. 2015, 27, 2964–2981. [Google Scholar] [CrossRef]
- Waqar, A.; Raza, A.; Abbas, H.; Khan, M.K. A framework for preservation of cloud users’ data privacy using dynamic reconstruction of metadata. J. Netw. Comput. Appl. 2013, 36, 235–248. [Google Scholar] [CrossRef]
- Lombardi, F.; Pietro, R.D. Secure virtualization for cloud computing. J. Netw. Comput. Appl. 2011, 34, 1113–1122. [Google Scholar] [CrossRef]
- Brodkin, J. Gartner: Seven cloud-computing security risks. In Proceedings of the Infoworld, Framingham, MA, USA, 2 July 2008. [Google Scholar]
- Grobauer, B.; Walloschek, T.; Stocker, E. Understanding cloud computing vulnerabilities. IEEE Secur. Priv. 2011, 9, 50–57. [Google Scholar] [CrossRef]
- Jansen, W.; Grance, T. Guidelines on Security and Privacy in Public Cloud Computing; NIST Special Publication; U.S. Department of Commerce: Washington, DC, USA, 2011; Volume 800, p. 144.
- Alliance, C.S. Security Guidance for Critical Areas of Cloud Computing Version 3.0. Available online: https://cloudsecurityalliance.org/guidance/csaguide.v3.0.pdf (accessed on 17 April 2019).
- White, T. Hadoop: The Definitive Guide[M], 3rd ed.; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2011. [Google Scholar]
- Lam, C. Hadoop in Action; Manning Publications Co.: Newton, MA, USA, 2011. [Google Scholar]
- Dean, J.; Ghemawat, S. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the Sixth Symposium on Operating System Design and Implementation (OSD2004), USENIX Association, San Francisco, CA, USA, 6–8 December 2004. [Google Scholar]
- Jiang, D.; Ooi, B.C.; Shi, L.; Wu, S. The performance of mapreduce: An indepth study. Proc. Vldb Endow. 2010, 3, 472–483. [Google Scholar] [CrossRef]
- Chen, Y.; Ganapathi, A.; Griffith, R.; Katz, R.H. The Case for Evaluating MapReduce Performance Using Workload Suites. Mascots 2011, 390–399. [Google Scholar] [CrossRef]
- Vernica, R.; Carey, M.J.; Li, C. Efficient parallel set-similarity joins using MapReduce. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, Indianapolis, IN, USA, 6–10 June 2010. [Google Scholar]
- Roy, I.; Setty ST, V.; Kilzer, A.; Shmatikov, V.; Witchel, E. Airavat: Security and Privacy for MapReduce. In Proceedings of the 7th USENIX Symposium on Networked Systems Design and Implementation, San Jose, CA, USA, 28–30 April 2010. [Google Scholar]
- Dang Vo-Huu, T.; Erik-Oliver, B.; Guevara, N. EPiC: Efficient privacy-preserving counting for MapReduce. Computing 2018. [Google Scholar] [CrossRef]
- O’Malley, O.; Zhang, K.; Radia, S.; Marti, R.; Harrell, C. Hadoop Security Design, 2009. Available online: http://carfield.com.hk:8080/document/distributed/hadoop-security-design.pdf (accessed on 17 April 2019).
- Foundation, T.A.S. Service Level Authorization Guide. 2013. Available online: https://hadoop.apache.org/docs/r1.2.1/service_level_auth.html (accessed on 9 July 2018).
- Das, D.; Malley, O.; Radia, S.; Zhang, K. Adding Security to Apache Hadoop, Hortonworks Technical Report.
- Vavilapalli, V.K.; Murthy, A.C.; Douglas, C.; Agarwal, S.; Konar, M.; Evans, R.; Graves, T.; Lowe, J.; Shah, H.; Seth, S.; et al. Apache Hadoop YARN: Yet another resource negotiator. In Proceedings of the 4th Annual Symposium on Cloud Computing, Santa Clara, CA, USA, 1–3 October 2013. [Google Scholar]
- Li, P.; Ju, L.; Jia, Z.; Sun, Z. SLA-Aware Energy-Efficient Scheduling Scheme for Hadoop YARN. In Proceedings of the IEEE International Conference on High Performance Computing & Communications, New York, NY, USA, 24–26 August 2015. [Google Scholar]
- Gencer, A.E.; Bindel, D.; Sirer Emin, G.; Renesse, R.V. Configuring Distributed Computations Using Response Surfaces. In Proceedings of the Middleware Conference, Vancouver, BC, Canada, 7–11 December 2015. [Google Scholar]
- Shao, Y.; Li, C.; Gu, J.; Zhang, J.; Luo, Y. Efficient Jobs Scheduling Approach for Big Data Applications. Comput. Ind. Eng. 2018, 117, 249–261. [Google Scholar] [CrossRef]
- Memishi, B.; Perez Maria, S.; Antoniu, G. Diarchy: An Optimized Management Approach for MapReduce Masters. Procedia Comput. Sci. 2015, 51, 9–18. [Google Scholar] [CrossRef]
- Dong, C.; Shen, Q.; Cheng, L.; Yang, Y.; Wu, Z. SECapacity: A Secure Capacity Scheduler in YARN. In Proceedings of the International Conference on Information and Communications Security (ICICS), Singapore, 29 November–2 December 2016; Lecture Notes in Computer Science. Volume 9977. [Google Scholar]
- Lu, W.; Chen, L.; Wang, L.; Yuan, H.; Xing, W.; Yang, Y. NPIY: A Novel Partitioner for Improving MapReduce Performance. J. Vis. Lang. Comput. 2018, 46, 1–11. [Google Scholar] [CrossRef]
- Lin, J.; Lee, M. Performance evaluation of job schedulers on Hadoop YARN. Concurr. Comput. Pract. Exp. 2016, 28, 2711–2728. [Google Scholar] [CrossRef]
- Wei, W.; Du, J.; Yu, T.; Gu, X. SecureMR: A Service Integrity Assurance Framework for MapReduce. In Proceedings of the Annual Computer Security Applications Conference, Honolulu, HI, USA, 7–11 December 2009; pp. 73–82. [Google Scholar]
Reference Value | YARN (Original) | SHIYF (30% Duplicate) | SHIYF (33% Malicious) |
---|---|---|---|
Map (block numbers) | 240 | 312 | 334–342 |
Reduce (block numbers) | 240 | 314 | 337–345 |
Malicious behavior (times) | 0 | 0 | 22–30 |
Malicious behavior records (times) | 0 | 0 | 20–28 |
The malicious nodes (hostnames) | none | none | hadoop2, hadoop5 |
The potential malicious nodes (hostnames) | none | none | hadoop1, hadoop3 |
Detection ratio (%) | 0 | 0 | 87%–93.3% |
TestDFSIO | Statistic | ||||
---|---|---|---|---|---|
WRITE (ORIGINAL) | Throughput (mb/sec) | 3.6532858730067091 | 4.1495804701173095 | 4.5426758165430426 | 5.0352236239847136 |
Average I/O * rate(mb/sec) | 3.8734657834132454 | 4.3471371178521935 | 4.7294176523151477 | 5.1312346348895038 | |
I/O rate std deviation | 0.430984357634622 | 0.951427385613907 | 0.615915048946195 | 0.445020674520458 | |
Test exec time(sec) | 2198.111 | 1561.544 | 1673.816 | 1890.473 | |
READ (ORIGINAL) | Throughput (mb/sec) | 12.970596932628902 | 18.046781056091405 | 17.784562337941026 | 15.858087512060852 |
Average I/O rate(mb/sec) | 13.012415248157041 | 18.178287573625446 | 17.792148534176382 | 15.885178120960451 | |
I/O rate std deviation | 1.498127648219716 | 1.246210558435681 | 0.485214225977834 | 0.781471278556941 | |
Test exec time (sec) | 660.835 | 474.956 | 481.959 | 540.508 | |
WRITE(30% SPECULATIVE) | Throughput (mb/sec) | 4.1869146712609413 | 4.8050487636647159 | 5.1450188934706481 | 5.5417910116501725 |
Average I/O rate(mb/sec) | 4.6151871263970814 | 4.7173163452271386 | 5.3504192374693026 | 5.9544018520531907 | |
I/O rate std deviation | 1.657201551021365 | 1.711504547126103 | 0.935136452047035 | 0.753113208091526 | |
Test exec time (sec) | 2371.649 | 1662.184 | 1764.051 | 1970.018 | |
READ(30% SPECULATIVE) | Throughput (mb/sec) | 12.457052964837827 | 17.893025372274712 | 17.039617160846092 | 14.901470746125078 |
Average I/O rate(mb/sec) | 12.568542145289061 | 17.932455253601074 | 17.809027862581273 | 15.076180259014069 | |
I/O rate std deviation | 1.0519670225048039 | 0.5843634163160046 | 1.6418415710048093 | 0.8952104835410775 | |
Test exec time(sec) | 688.078 | 479.037 | 506.295 | 578.209 | |
WRITE(33.3% MALICIOUS) | Throughput (mb/sec) | 5.272581215418207 | 5.6601450830043722 | 6.0414721722833548 | 6.4590725064507841 |
Average I/O rate(mb/sec) | 5.3641939105048191 | 5.9178047153028364 | 6.4681551987331872 | 7.0720323180910965 | |
I/O rate std deviation | 0.505720201569015 | 1.450121028363904 | 1.256178142824583 | 2.045873016820649 | |
Test exec time (sec) | 2549.196 | 1768.511 | 1836.149 | 2104.951 | |
READ(33.3% MALICIOUS) | Throughput (mb/sec) | 11.990184970451683 | 17.0845710814059107 | 16.193211574396384 | 14.006810615806931 |
Average I/O rate(mb/sec) | 12.005265249203364 | 17.83786672858012 | 16.350187485045837 | 14.75482619974933 | |
I/O rate std deviation | 0.872031602505907 | 2.010804105194108 | 0.949113918481016 | 2.06249176210582 | |
Test exec time (sec) | 716.870 | 503.706 | 530.322 | 610.947 |
Scenarios | Temporal Growth Rate (Read) | Temporal Growth Rate (Write) |
---|---|---|
YARN (original) | 0 | 0 |
SHIYF (30% duplicate) | 6.981% | 7.895% |
SHIYF (33% malicious) | 13.031% | 15.094% |
Network Throughput (mb/sec) | ||||
---|---|---|---|---|
YARN (original) | 0.609 | 6.916 | 30.285 | 50.352 |
SHIYF (30% duplicate) | 0.698 | 8.008 | 34.300 | 55.417 |
SHIYF (33% malicious) | 0.879 | 9.434 | 40.276 | 64.591 |
Reference Value | hadoop2 | hadoop5 | hadoop1 | hadoop3 | hadoop6 |
---|---|---|---|---|---|
Malicious action times | ≤10 | ≤10 | 0 | 0 | 0 |
Record times | 9 | 9 | 0 | 0–1 | 0–1 |
Malicious node or Potential Malicious node | Malicious node | Malicious node | None | Potential Malicious node | Potential Malicious node |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Deng, J.; Liu, Y.; Wang, J.; Li, S. SHIYF: A Secured and High-Integrity YARN Framework. Electronics 2019, 8, 548. https://doi.org/10.3390/electronics8050548
Deng J, Liu Y, Wang J, Li S. SHIYF: A Secured and High-Integrity YARN Framework. Electronics. 2019; 8(5):548. https://doi.org/10.3390/electronics8050548
Chicago/Turabian StyleDeng, Junyi, Yanheng Liu, Jian Wang, and Shujing Li. 2019. "SHIYF: A Secured and High-Integrity YARN Framework" Electronics 8, no. 5: 548. https://doi.org/10.3390/electronics8050548
APA StyleDeng, J., Liu, Y., Wang, J., & Li, S. (2019). SHIYF: A Secured and High-Integrity YARN Framework. Electronics, 8(5), 548. https://doi.org/10.3390/electronics8050548