Agnostic Energy Consumption Models for Heterogeneous GPUs in Cloud Computing
Abstract
:1. Introduction
- Novel GPU power consumption models considering hybrid inputs for heterogeneous GPU architectures in cloud computing. This model aims to predict the power consumed by GPU applications running on a heterogeneous cloud-computing infrastructure.
- A novel agnostic GPU energy consumption model. This model aims to automatically estimate the energy consumed by GPU applications executing on a heterogeneous cloud-computing infrastructure and abstract the heterogeneity of GPU architectures.
2. Related Work
3. Power and Energy Models
3.1. Power Models
- (1)
- Applications
- (2)
- Data Collection
- (3)
- Input Selection
Algorithm 1: CANDIDACY ALGORITHM |
Input: C [ ], A [ ] Output: HC [ ] 1: set HC [ ], P1 [ ], P2 [ ], FC [ ] = null 2: m [ ] = 480 × 480 3: begin phase 1 4: for each c in C [ ] do 5: for (i = 1; i ≤ 8; i++) do 6: set PC1 [ ] = null 7: increase m (the matrix size) gradually 8: profile average (power consumption) in every matrix size 9: profile (performance counter) in every matrix size 10: enqueue into P1 [ ] 11: enqueue into PC1 [ ] 12: end for 13: apply the Pearson correlation test between P1 and 14: if || ≥ 0.5 then 15: enqueue into FC [ ] 16: else 17: reject 18: end if 19: end for 20: end phase 1 21: begin phase 2 22: for each c in FC [ ] do 23: for (i = 1; i ≤ 30; i++) do 24: set PC2 [ ] = null 25: profile average (power consumption) in every application 26: profile (performance counter) in every application 27: enqueue into P2 [ ] 28: enqueue into PC2 [ ] 29: end for 30: apply Pearson correlation test between P2 and 31: calculate Ω 32: if || ≥ Ω then 33: enqueue into HC [ ] 34: else 35: reject 36: end if 37: end for 38: end phase 2 |
- (4)
- Model Design
- (5)
- Model Training
- (6)
- Hybrid Input Model
3.2. Energy Models
- (1)
- Applications
- (2)
- Data Collection and Input Selection
- (3)
- Model Design
4. Result
4.1. Implementation
4.2. Experimental Set-Up
4.3. Power Models’ Results
4.4. Hybrid Inputs Power Models
4.5. Energy Models
5. Discussion
- (1)
- Power Consumption Models
- (2)
- Energy Consumption Models
6. Limitations
- Each VM in cloud computing consists of different virtual resources, such as CPUs, memory, and network traffic. These resources affect the total VM energy consumption. Although the workload has been executed by the GPU, the developed energy consumption models in this study have only considered the energy consumed by the GPU and neglected other resources that affect the total VM energy consumption.
- Cloud computing is known for the diversity and heterogeneity of its resources, and the cloud-computing infrastructure hosts a great number of VMs and PMs. However, this work has only considered two VMs and two types of NVIDIA GPU architectures in cloud computing. This is because of the high cost of performing experiments on a real cloud testbed that contains a large number of heterogeneous GPUs. Moreover, simulation tools that support heterogeneous GPU architectures in cloud-computing environments are not available.
7. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- CISCO. Cisco Annual Internet Report 2018–2023 White Paper; CISCO: San Jose, CA, USA, 2020. [Google Scholar]
- Public Cloud Computing Market Size 2022|Statista. Available online: https://www.statista.com/statistics/273818/global-revenue-generated-with-cloud-computing-since-2009/ (accessed on 25 July 2023).
- Amazon EC2 P3—Ideal for Machine Learning and HPC—AWS. Available online: https://aws.amazon.com/ec2/instance-types/p3/ (accessed on 1 August 2023).
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, inception-ResNet and the impact of residual connections on learning. arXiv preprint 2016, arXiv:1602.07261. [Google Scholar] [CrossRef]
- Mukherjee, T.; Dasgupta, K.; Gujar, S.; Jung, G.; Lee, H. An economic model for green cloud. In Proceedings of the 10th International Workshop on Middleware for Grids, Clouds and e-Science, MGC 2012, Montreal, QC, Canada, 3–7 December 2012. [Google Scholar] [CrossRef]
- Hamilton, J. Cooperative expendable micro-slice servers (CEMS): Low cost, low power servers for internet-scale services. In Proceedings of the Conference on Innovative Data Systems Research (CIDR’09), Asilomar, CA, USA, 4–7 January 2009; pp. 1–8. [Google Scholar]
- Andrae, A.; Edler, T. On Global Electricity Usage of Communication Technology: Trends to 2030. Challenges 2015, 6, 117–157. [Google Scholar] [CrossRef]
- Jones, N. The information factories. Nature 2018, 561, 163–166. [Google Scholar] [CrossRef] [PubMed]
- Ghosh, S.; Chandrasekaran, S.; Chapman, B.M. Statistical Modeling of Power/Energy of Scientific Kernels on a Multi-GPU system. In Proceedings of the 2013 International Green Computing Conference Proceedings, Arlington, VA, USA, 27–29 June 2013; pp. 1–6. [Google Scholar] [CrossRef]
- Hong, S.; Kim, H. An integrated GPU power and performance model. In Proceedings of the 37th Annual International Symposium on Computer Architecture, Saint-Malo, France, 19–23 June 2010; pp. 280–289. [Google Scholar] [CrossRef]
- Kasichayanula, K.; Terpstra, D.; Luszczek, P.; Tomov, S.; Moore, S.; Peterson, G.D. Power aware computing on GPUs. In Proceedings of the Symposium on Application Accelerators in High-Performance Computing, Argonne, IL, USA, 10–11 July 2012; pp. 64–73. [Google Scholar] [CrossRef]
- Adhinarayanan, V.; Subramaniam, B.; Feng, W. Online Power Estimation of Graphics Processing Units. In Proceedings of the IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Cartagena, Colombia, 16–19 May 2016. [Google Scholar] [CrossRef]
- Makaratzis, A.T.; Khan, M.M.; Giannoutakis, K.M.; Elster, A.C.; Tzovaras, D. GPU Power Modeling of HPC Applications for the Simulation of Heterogeneous Clouds. In International Conference on Parallel Processing and Applied Mathematics; Springer International Publishing: Berlin/Heidelberg, Germany, 2017; pp. 91–101. [Google Scholar] [CrossRef]
- Leng, J.; Hetherington, T.; ElTantawy, A.; Gilani, S.; Kim, N.S.; Aamodt, T.M.; Reddi, V.J. GPUWattch: Enabling Energy Optimizations in GPGPUs. ACM SIGARCH Comput. Archit. News 2013, 41, 487–498. [Google Scholar] [CrossRef]
- Lucas, J.; Lal, S.; Andersch, M.; Alvarez-Mesa, M.; Juurlink, B. How a single chip causes massive power bills GPUSimPow: A GPGPU power simulator. In Proceedings of the ISPASS 2013—IEEE International Symposium on Performance Analysis of Systems and Software, Austin, TX, USA, 21–23 April 2013; pp. 97–106. [Google Scholar] [CrossRef]
- Song, S.; Su, C.; Rountree, B.; Cameron, K.W. A simplified and accurate model of power-performance efficiency on emergent GPU architectures. In Proceedings of the IEEE 27th International Parallel and Distributed Processing Symposium, IPDPS 2013, Cambridge, MA, USA, 20–24 May 2013; pp. 673–686. [Google Scholar] [CrossRef]
- Abe, Y.; Sasaki, H.; Kato, S.; Inoue, K.; Edahiro, M.; Peres, M. Power and performance characterization and modeling of GPU-accelerated systems. In Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS, Phoenix, AZ, USA, 19–23 May 2014; pp. 113–122. [Google Scholar] [CrossRef]
- Xie, Q.; Huang, T.; Zou, Z.; Xia, L.; Zhu, Y.; Jiang, J. An accurate power model for GPU processors. In Proceedings of the 2012 7th International Conference on Computing and Convergence Technology (ICCIT, ICEI and ICACT), ICCCT 2012, Seoul, Republic of Korea, 3–5 December 2012; pp. 1141–1146. [Google Scholar]
- Siavashi, A.; Momtazpour, M. GPUCloudSim: An extension of CloudSim for modeling and simulation of GPUs in cloud data centers. J. Supercomput. 2019, 75, 2535–2561. [Google Scholar] [CrossRef]
- Metz, C.A.; Goli, M.; Drechsler, R. ML-based Power Estimation of Convolutional Neural Networks on GPGPUs. In Proceedings of the 2022 25th International Symposium on Design and Diagnostics of Electronic Circuits and Systems, DDECS 2022, Prague, Czech Republic, 6–8 April 2022; pp. 166–171. [Google Scholar] [CrossRef]
- Moolchandani, D.; Kumar, A.; Sarangi, S.R. Performance and Power Prediction for Concurrent Execution on GPUs. ACM Trans. Archit. Code Optim. 2022, 19, 1–27. [Google Scholar] [CrossRef]
- Braun, L.; Nikas, S.; Song, C.; Heuveline, V.; Fröning, H. A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU Kernels. ACM Trans. Archit. Code Optim. 2021, 18, 1–25. [Google Scholar] [CrossRef]
- Boughzala, D.; Lefèvre, L.; Orgerie, A.C. Predicting the energy consumption of CUDA kernels using SimGrid. In Proceedings of the SBAC-PAD-IEEE International Symposium on Computer Archi- Tecture and High Performance Computing International Symposium on Computer Archi- Tecture and High Performance Computing, Porto, Portugal, 9–11 September 2020; pp. 1–8. [Google Scholar]
- Casanova, H.; Giersch, A.; Legrand, A.; Quinson, M.; Suter, F. Versatile, scalable, and accurate simulation of distributed applications and platforms. J. Parallel Distrib. Comput. 2014, 74, 2899–2917. [Google Scholar] [CrossRef]
- Ilager, S.; Muralidhar, R.; Rammohanrao, K.; Buyya, R. A Data-Driven Frequency Scaling Approach for Deadline-aware Energy Efficient Scheduling on Graphics Processing Units (GPUs). arXiv preprint 2020, arXiv:2004.08177. [Google Scholar]
- Nagasaka, H.; Maruyama, N.; Nukada, A.; Endo, T.; Matsuoka, S. Statistical power modeling of GPU kernels using performance counters. In Proceedings of the International Conference on Green Computing, Green Comp, Chicago, IL, USA, 15–18 August 2010; pp. 115–122. [Google Scholar] [CrossRef]
- Che, S.; Boyer, M.; Meng, J.; Tarjan, D.; Sheaffer, J.W.; Lee, S.H.; Skadron, K. Rodinia: A Benchmark Suite for Heterogeneous Computing. In Proceedings of the IEEE International Symposium on Workload Characterization, Austin, TX, USA, 4–6 October 2009; pp. 44–54. [Google Scholar]
- Stratton, J.A.; Rodrigues, C.; Sung, I.J.; Obeid, N.; Chang, L.W.; Anssari, N.; Liu, G.; Hwu, W.M.W. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing. Cent. Reliab. High-Perform. Comput. 2012, 127, 27. [Google Scholar]
- CUDA Code Samples|NVIDIA Developer. Available online: https://developer.nvidia.com/cuda-code-samples (accessed on 23 April 2023).
- nvidia-smi. Available online: http://developer.download.nvidia.com/compute/DCGM/docs/nvidia-smi-367.38.pdf (accessed on 20 February 2023).
- Profiler User’s Guide. Available online: http://docs.nvidia.com/cuda/profiler-users-guide/index.html#gpu-trace-and-api-trace-modes (accessed on 10 April 2023).
- Profiler User’s Guide. Available online: https://cseweb.ucsd.edu/classes/wi15/cse262-a/static/cuda-5.5-doc/pdf/CUDA_Profiler_Users_Guide.pdf (accessed on 5 May 2023).
- Curtis-Maury, M.; Shah, A.; Blagojevic, F.; Nikolopoulos, D.S.; De Supinski, B.R.; Schulz, M. Prediction models for multi-dimensional power-performance optimization on many cores. In Proceedings of the Parallel Architectures and Compilation Techniques—Conference Proceedings, PACT, Toronto, ON, Canada, 25–29 October 2008; pp. 250–259. [Google Scholar] [CrossRef]
- Zaharia, M.; Konwinski, A.; Joseph, A.; Katz, R.; Stoica, I. Improving MapReduce performance in heterogeneous environments. OSDI 2008, 8, 7. [Google Scholar] [CrossRef]
- Fall, D.; Okuda, T.; Kadobayashi, Y.; Yamaguchi, S. Risk adaptive authorization mechanism (RAdAM) for cloud computing. J. Inf. Process. 2016, 24, 371–380. [Google Scholar] [CrossRef]
- Shao, G.; Chen, J. A load balancing strategy based on data correlation in cloud computing. In Proceedings of the 9th IEEE/ACM International Conference on Utility and Cloud Computing, UCC 2016, ACM, New York, NY, USA, 6–9 December 2016; pp. 364–368. [Google Scholar] [CrossRef]
- Lek, S.; Delacoste, M.; Baran, P.; Dimopoulos, I.; Lauga, J.; Aulagnier, S. Application of neural networks to modelling nonlinear relationships in ecology. Ecol. Model. 1996, 90, 39–52. [Google Scholar] [CrossRef]
- Mukhopadhyay, S.; Samanta, P. Deep Learning and Neural Networks. In Advanced Data Analytics Using Python; Apress: Berkeley, CA, USA, 2015. [Google Scholar] [CrossRef]
- Hinton, G.; Deng, L.; Yu, D.; Dahl, G.E.; Mohamed, A.R.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.N.; et al. Deep neural network for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 2012, 29, 82–97. [Google Scholar] [CrossRef]
- Bergstra, J.; Bardenet, R.; Bengio, Y.; Kégl, B. Algorithms for hyper-parameter optimization. Adv. Neural Inf. Process. Syst. 2011, 24, 1–9. [Google Scholar]
- Greff, K.; Srivastava, R.K.; Koutnik, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A Search Space Odyssey. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 2222–2232. [Google Scholar] [CrossRef] [PubMed]
- Borovcnik, M.; Bentz, H.-J.; Kapadia, R. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar] [CrossRef]
- Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv preprint 2014, arXiv:1412.6980. [Google Scholar] [CrossRef]
- Duchi, J.C.; Bartlett, P.L.; Wainwright, M.J. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. J. Ofmachine Learn. Res. 2011, 12, 2121–2159. [Google Scholar] [CrossRef]
- Alnori, A.; Djemame, K. A Holistic Resource Management for Graphics Processing Units in Cloud Computing. Electron. Notes Theor. Comput. Sci. 2018, 340, 3–22. [Google Scholar] [CrossRef]
- Home—Keras Documentation. Available online: https://keras.io/ (accessed on 27 July 2023).
- OpenNebula. Available online: https://opennebula.io/ (accessed on 26 June 2023).
Application | Source | Description | Size |
---|---|---|---|
RecursiveGaussian | Cuda SDK | Gaussian blur implementation using Deriche’s recursive way or recursive Gaussian filter | 512 × 512 |
HSOpticalFlow | Cuda SDK | A clear movement of objects estimation in a picture | 640 × 480 |
Interval | Cuda SDK | Interval Newton method calculation | 65,536 equations |
SobolQRNG | Cuda SDK | Sobol quasirandom sequence implementation | 1,000,000 vectors and 1000 dimensions |
Reduction | Cuda SDK | Parallel reduction technique implementation | 16,777,216 elements |
ScalarProd | Cuda SDK | Scalar product implementation | 2048 vectors and 131,072 elements |
StereoDisparity | Cuda SDK | Stereo disparity computation implementation | 1800 × 1800 |
ThreadFenceReduction | Cuda SDK | Implementation of array reduction operation using thread fence instruction approach | 1,048,576 elements; 128 threads; 64 blocks |
Sgemm | Parboil | Single matrix multiplication implementation | A × B; A = 2048 × 1984, B = 1984 × 2112 |
Spmv | Parboil | Sparse matrix vector multiplication implementation | 146,689 × 146,689 |
Stencil | Parboil | 3D seven-point stencil implementation | 128 × 128 × 32; 200 iterations |
Heartwall | Rodinia | Ultrasound image for heart wall tracking | 656 × 744 frame |
Kmeans | Rodinia | Clustering algorithm | 494,020 objects and 34 features |
Gaussian | Rodinia | Gaussian elimination technique for solving equations | 1024 × 1024 |
Leukocyte | Rodinia | Microscopy tracking of white blood cells | 640 × 480 frames |
Nw | Rodinia | Needleman–Wunsch method for optimized DNA alignment | 6400 × 6400 |
Hotspot | Rodinia | Simulation For estimating a processor temperature | 4096 × 4096 |
Dwt2d | Rodinia | Two-dimensional discrete wavelet transform algorithm | 3900 × 4200 pixels |
CFD | Rodinia | Computational fluid dynamic solver | 97,000 elements |
AlignedTypes | Cuda SDK | Memory aligned access implementation type | 49,999,872 bytes |
Binomial Options | Cuda SDK | The European pricing options between seller and buyer | 1024 options |
BlackScholes | Cuda SDK | The European pricing options using the Black–Scholes model | 8,000,000 options and 512 iterations |
Dxtc | Cuda SDK | DirectX texture compression algorithm | 512 × 512 pixels |
ConvolutionSeparable | Cuda SDK | Convolution technique for image filtrations | 3072 × 3072 pixels |
Histogram | Cuda SDK | Analysis tool for applications | 64 and 256 bins |
Transpose | Cuda SDK | Matrix transpose implementation | 1024 × 1024 |
FDTD3d | Cuda SDK | Three-dimensional finite difference time domain model | 376 × 376 × 376 |
MergeSort | Cuda SDK | Merge sort implementation | 4,194,304 elements |
RadixSortThrust | Cuda SDK | Parallel radix sort implementation | 1,048,576 elements |
GuasirandomGenerator | Cuda SDK | Niederreiter quasirandom sequence implementation | 31,148,576 elements |
Application | Source | Size |
---|---|---|
ConvolutionTexture | Cuda SDK | 3072 × 1536 Pixels |
FastWalshTransform | Cuda SDK | 8,388,608 data length |
MontCarlo | Cuda SDK | 8192 options and 262,144 paths |
Nbody | Cuda SDK | 16,640 bodies |
Cutcp | Parboil | 96,603 atoms |
Histo | Parboil | 256 × 8192 |
Mri-q | Parboil | 64 × 64 × 64 |
Bfs | Rodinia | 1,000,000 nodes |
Streamcluster | Rodinia | 65,536 points |
Srad_2 | Rodinia | 8192 × 8192 data points |
Metric Name | Description | Value |
---|---|---|
Gst_transactions | Transactions of global memory store number | 0.281 |
Ecc_transactions | Error-correcting code memory transactions sent between L2 cache memory and DRAM memory number | 0.421 |
L2_read_transactions | L2 cache memory read transactions number | 0.236 |
L2_write_transactions | L2 cache memory write transactions number | 0.281 |
L2_L1_read_transactions | Memory read transactions requested by L1 cache memory and seen in L2 cache memory number | 0.310 |
L2_L1_write_transactions | Memory write transactions requested by L1 cache memory and seen in L2 cache memory number | 0.310 |
Eligible_warps_per_cycle | The average number of eligible wraps to be issued in every cycle | 0.291 |
DRAM_read_transactions | The GPU device RAM read transactions number | 0.293 |
DRAM_write_transactions | The GPU device RAM write transactions number | 0.284 |
Metric Name | Description | Value |
---|---|---|
Gld_transactions_per_request | The average number of transactions of the global memory load conducted for every global memory request | −0.187 |
Gld_transactions | Transactions of global memory load number | 0.102 |
Inst_per_warp | The average number of instructions run by every warp | 0.165 |
DRAM_read_transactions | The GPU device RAM read transactions number | −0.090 |
IPC | Instructions performed per cycle number | 0.275 |
Issued_IPC | Issued number of instructions for every cycle | 0.214 |
Application | Size |
---|---|
Gaussian | 4000 × 4000 |
Gaussian | 6000 × 6000 |
Gaussian | 8000 × 8000 |
Gaussian | 10,000 × 10,000 |
Gaussian | 12,000 × 12,000 |
Gaussian | 14,000 × 14,000 |
Gaussian | 18,000 × 18,000 |
Gaussian | 20,000 × 20,000 |
Hotspot | 1024 × 1024 |
Hotspot | 2048 × 2048 |
Hotspot | 4096 × 4096 |
Hotspot | 16,384 × 16,384 |
Nbody | 665,600 |
Nbody | 998,400 |
Nbody | 1,331,200 |
Nbody | 1,664,000 |
Nbody | 1,996,800 |
Nbody | 2,329,600 |
Nbody | 2,995,200 |
Nbody | 3,328,000 |
Srad | 3200 × 3200 |
Srad | 4800 × 4800 |
Srad | 6400 × 6400 |
Srad | 8000 × 8000 |
Srad | 9600 × 9600 |
Srad | 11,200 × 11,200 |
Srad | 14,400 × 14,400 |
Streamcluster | 65,536 |
Streamcluster | 131,072 |
Streamcluster | 262,144 |
Streamcluster | 524,288 |
Streamcluster | 2,097,152 |
Streamcluster | 4,194,304 |
Application | Size | Abbreviation |
---|---|---|
Gaussian | 16,000 × 16,000 | G16000 |
Hotspot | 8192 × 8192 | H8192 |
Nbody | 2,662,400 | N2662400 |
Srad | 12,800 × 12,800 | S12800 |
Streamcluster | 1,048,576 | SC1048576 |
Model Input | VM with C2075 | VM with K40c | Common Inputs (Agnostic Model) |
---|---|---|---|
Gst_transactions | X | ||
Ecc_transactions | X | ||
L2_read_transactions | X | ||
L2_write_transactions | X | ||
L2_l1_read_transactions | X | ||
L2_l1_write_transactions | X | ||
Dram_read_transactions | X | X | X |
Dram_write_transactions | X | ||
Gld_transactions_per_request | X | ||
Gld_transactions | X | ||
Execution time | X | X | X |
Temperature | X | X | X |
Power consumption | X | X | X |
GPU utilization | X | X | X |
Memory utilization | X | X | X |
VM Characteristics | VM1 | VM2 |
---|---|---|
CPU | Intel Xeon E5-2630 v3 2.4 GHz | Intel Xeon E5-2630 v3 2.4 GHz |
VCPU | 8 | 8 |
RAM Size | 32 GB | 64 GB |
GPU | NVIDIA Fermi C2075 | NVIDIA Kepler K40c |
Hypervisor | KVM | |
CUDA Compiler Version | 7.5 | |
OS | Linux CentOS | |
VIM | OpenNebula |
GPU Details | Fermi C2075 | Kepler K40c |
---|---|---|
CUDA Cores | 448 | 2880 |
SMs | 14 | 15 |
Cores/SM | 32 | 192 |
Core Frequency (MHz) | 1150 | 745 |
Memory Size (GB) | 6 | 12 |
Max Power Consumption (W) | 225 | 235 |
Max Threads/Block | 1024 | 1024 |
Max Warp/SM | 48 | 64 |
Max Thread Blocks/SM | 8 | 16 |
ECC Mode | Enabled | Enabled |
VMs | VM with Fermi C2075 GPU | VM with Kepler K40c | |||
---|---|---|---|---|---|
Power Models | Mean Error | Greatest Error | Mean Error | Greatest Error | |
Performance Counters Inputs (Shallow) | 45 | 137 | 27.4 | 56.3 | |
Performance Counters Inputs (Deep) | 18.1 | 38.1 | 17 | 82 | |
Hybrid Inputs (Shallow) | 12.5 | 40 | 10 | 29 | |
Hybrid Inputs (Deep) | 9 | 28.3 | 9 | 23 |
VMs | VM with Fermi C2075 GPU | VM with Kepler K40c | |||
---|---|---|---|---|---|
Energy Models | Mean Error | Greatest Error | Mean Error | Greatest Error | |
DNN with Standard Inputs | 9.1 | 25 | 6.5 | 24.3 | |
DNN with Common Inputs (agnostic model) | 8.6 | 15.4 | 6.3 | 23.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alnori, A.; Djemame, K.; Alsenani, Y. Agnostic Energy Consumption Models for Heterogeneous GPUs in Cloud Computing. Appl. Sci. 2024, 14, 2385. https://doi.org/10.3390/app14062385
Alnori A, Djemame K, Alsenani Y. Agnostic Energy Consumption Models for Heterogeneous GPUs in Cloud Computing. Applied Sciences. 2024; 14(6):2385. https://doi.org/10.3390/app14062385
Chicago/Turabian StyleAlnori, Abdulaziz, Karim Djemame, and Yousef Alsenani. 2024. "Agnostic Energy Consumption Models for Heterogeneous GPUs in Cloud Computing" Applied Sciences 14, no. 6: 2385. https://doi.org/10.3390/app14062385
APA StyleAlnori, A., Djemame, K., & Alsenani, Y. (2024). Agnostic Energy Consumption Models for Heterogeneous GPUs in Cloud Computing. Applied Sciences, 14(6), 2385. https://doi.org/10.3390/app14062385