Research Article

Energy-Efficient Training and Inference in Large Language Models: Optimizing Computational and Energy Costs

by  Krishnam Raju Narsepalle
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Issue 14
Published: June 2025
Authors: Krishnam Raju Narsepalle
10.5120/ijca2025925323
PDF

Krishnam Raju Narsepalle . Energy-Efficient Training and Inference in Large Language Models: Optimizing Computational and Energy Costs. International Journal of Computer Applications. 187, 14 (June 2025), 1-13. DOI=10.5120/ijca2025925323

                        @article{ 10.5120/ijca2025925323,
                        author  = { Krishnam Raju Narsepalle },
                        title   = { Energy-Efficient Training and Inference in Large Language Models: Optimizing Computational and Energy Costs },
                        journal = { International Journal of Computer Applications },
                        year    = { 2025 },
                        volume  = { 187 },
                        number  = { 14 },
                        pages   = { 1-13 },
                        doi     = { 10.5120/ijca2025925323 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2025
                        %A Krishnam Raju Narsepalle
                        %T Energy-Efficient Training and Inference in Large Language Models: Optimizing Computational and Energy Costs%T 
                        %J International Journal of Computer Applications
                        %V 187
                        %N 14
                        %P 1-13
                        %R 10.5120/ijca2025925323
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

The larger the size of the Large Language Models (LLMs) is, the higher their computational and energy costs become, and thus, the environmental and economic impact increases. This paper examines several initiatives aimed at reducing the energy and computational costs associated with training and deploying Large Language Models (LLMs). Training sparse, adaptive inference, and hardware acceleration (based on GPUs and TPUs) are assessed. The modelling experiments using BERT and GPT indicate that sparse training reduces the computational workload by an additional 35%, while adaptive inference significantly reduces energy consumption during inference by 20%. Additionally, a 25% energy savings has been achieved by optimizing resource loading on the hardware. These findings suggest that energy-efficient Large Language Model (LLM) training and inference methods can significantly reduce the environmental impact of large-scale AI models, making them more sustainable for widespread use.

References
  • Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, 4171–4186. https://doi.org/10.18653/v1/N19-1423
  • Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. OpenAI. https://www.openai.com/research/language-unsupervised/
  • Frankle, J., & Carbin, M. (2019). The lottery ticket hypothesis: Finding sparse, trainable neural networks. 7th International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1803.03635
  • Narang, S., Elsen, E., Diamos, G., & Sengupta, S. (2017). Exploring sparsity in recurrent neural networks. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1704.05119
  • Hubara, I., Nahshan, Y., Hoffer, E., & Soudry, D. (2021). Training with Quantisation noise for extreme model compression. Advances in Neural Information Processing Systems (NeurIPS), 34, 10186–10197. https://arxiv.org/abs/2004.07320
  • Dettmers, T., Lewis, M., Shleifer, S., & Zettlemoyer, L. (2022). LLM.int8(): 8-bit matrix multiplication for transformers at scale. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP). https://arxiv.org/abs/2208.07339
  • Schwartz, R., Dodge, J., Smith, N. A., & Etzioni, O. (2020). Green AI. Communications of the ACM, 63(12), 54–63. https://doi.org/10.1145/3381831
  • Liu, X., You, H., Zhang, Y., & Demmel, J. (2021). Dynamic neural networks for efficient inference. International Conference on Machine Learning (ICML). https://arxiv.org/abs/2102.04906
  • Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), 3645–3650. https://doi.org/10.18653/v1/P19-1355
  • Jouppi, N. P., Young, C., Patil, N., et al. (2017). In-datacenter performance analysis of a tensor processing unit. Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA), 1–12. https://doi.org/10.1145/3079856.3080246
  • Gupta, U., Lee, J., Na, T., et al. (2020). Efficient AI at scale with nanoscale systems. Google AI Blog. https://ai.googleblog.com/
  • Patterson, D., Gonzalez, J., Hölzle, U., et al. (2021). Carbon emissions and large neural network training. Nature Machine Intelligence, 3(2), 89–94. https://doi.org/10.1038/s42256-020-00297-z
  • Micikevicius, P., Narang, S., Alben, J., et al. (2018). Mixed precision training. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1710.03740
  • Rajbhandari, S., Rasley, J., Ruwase, O., & He, Y. (2022). ZeRO: Memory optimization towards training trillion parameter models. Advances in Neural Information Processing Systems (NeurIPS). https://arxiv.org/abs/1910.02054
  • Lacoste, A., Luccioni, A., Schmidt, V., & Dandres, T. (2019). Quantifying the carbon emissions of machine learning. NeurIPS Workshop on Tackling Climate Change with Machine Learning. https://arxiv.org/abs/1910.09700
  • Henderson, P., Hu, J., Romoff, J., et al. (2020). Towards environmentally sustainable AI: Challenges, opportunities, and a research agenda. Proceedings of the 2020 ACM Conference on Fairness, Accountability, and Transparency (FAccT). https://doi.org/10.1145/3351095.3372828
  • Google AI. (2020). Toward a more sustainable AI. Google Research Blog. https://ai.googleblog.com/
  • Facebook AI. (2021). Reducing the environmental impact of AI systems. Facebook AI Blog. https://ai.facebook.com/blog/
  • BigScience Workshop. (2022). BLOOM: A 176B parameter open-access language model. arXiv preprint. https://arxiv.org/abs/2211.05100
  • Black, S., et al. (2021). GPT-Neo: Large-scale autoregressive language models. EleutherAI. https://github.com/EleutherAI/gpt-neo
  • Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT). https://doi.org/10.1145/3442188.3445922
  • Birhane, A., van Dijk, J., & Priya, S. (2022). The cost of AI: Environmental and social impacts. NeurIPS Workshop on Machine Learning for the Developing World. https://arxiv.org/abs/2206.11990
  • Northcutt, C. G., Athalye, A., & Mueller, J. (2021). Pervasive label errors in test sets destabilize machine learning benchmarks. Journal of Machine Learning Research (JMLR), 22(1), 1–48. https://jmlr.org/papers/v22/20-950.html
  • Dodge, J., Gururangan, S., Card, D., et al. (2020). Fine-tuning pretrained language models: Weight initializations, data orders, and early stopping. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). https://doi.org/10.18653/v1/2020.emnlp-main.522
  • Lewis, P., Perez, E., Piktus, A., et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems (NeurIPS). https://arxiv.org/abs/2005.11
Index Terms
Computer Science
Information Sciences
Energy Efficiency
Model Optimisation
Sustainability
Artificial Intelligence (AI)
Computational Efficiency
Keywords

Energy-Efficient Training LLM Sparse Training Adaptive Inference Hardware Acceleration

Powered by PhDFocusTM