Energy-Efficient Training and Inference in Large Language Models: Optimizing Computational and Energy Costs

Krishnam Raju Narsepalle

Research Article

Energy-Efficient Training and Inference in Large Language Models: Optimizing Computational and Energy Costs

by Krishnam Raju Narsepalle

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 187 - Issue 14

Published: June 2025

Authors: Krishnam Raju Narsepalle

10.5120/ijca2025925323

PDF

Krishnam Raju Narsepalle . Energy-Efficient Training and Inference in Large Language Models: Optimizing Computational and Energy Costs. International Journal of Computer Applications. 187, 14 (June 2025), 1-13. DOI=10.5120/ijca2025925323

                        @article{ 10.5120/ijca2025925323,
                        author  = { Krishnam Raju Narsepalle },
                        title   = { Energy-Efficient Training and Inference in Large Language Models: Optimizing Computational and Energy Costs },
                        journal = { International Journal of Computer Applications },
                        year    = { 2025 },
                        volume  = { 187 },
                        number  = { 14 },
                        pages   = { 1-13 },
                        doi     = { 10.5120/ijca2025925323 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2025
                        %A Krishnam Raju Narsepalle
                        %T Energy-Efficient Training and Inference in Large Language Models: Optimizing Computational and Energy Costs%T 
                        %J International Journal of Computer Applications
                        %V 187
                        %N 14
                        %P 1-13
                        %R 10.5120/ijca2025925323
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

The larger the size of the Large Language Models (LLMs) is, the higher their computational and energy costs become, and thus, the environmental and economic impact increases. This paper examines several initiatives aimed at reducing the energy and computational costs associated with training and deploying Large Language Models (LLMs). Training sparse, adaptive inference, and hardware acceleration (based on GPUs and TPUs) are assessed. The modelling experiments using BERT and GPT indicate that sparse training reduces the computational workload by an additional 35%, while adaptive inference significantly reduces energy consumption during inference by 20%. Additionally, a 25% energy savings has been achieved by optimizing resource loading on the hardware. These findings suggest that energy-efficient Large Language Model (LLM) training and inference methods can significantly reduce the environmental impact of large-scale AI models, making them more sustainable for widespread use.

References

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, 4171–4186. https://doi.org/10.18653/v1/N19-1423
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. OpenAI. https://www.openai.com/research/language-unsupervised/
Frankle, J., & Carbin, M. (2019). The lottery ticket hypothesis: Finding sparse, trainable neural networks. 7th International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1803.03635
Narang, S., Elsen, E., Diamos, G., & Sengupta, S. (2017). Exploring sparsity in recurrent neural networks. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1704.05119
Hubara, I., Nahshan, Y., Hoffer, E., & Soudry, D. (2021). Training with Quantisation noise for extreme model compression. Advances in Neural Information Processing Systems (NeurIPS), 34, 10186–10197. https://arxiv.org/abs/2004.07320
Dettmers, T., Lewis, M., Shleifer, S., & Zettlemoyer, L. (2022). LLM.int8(): 8-bit matrix multiplication for transformers at scale. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP). https://arxiv.org/abs/2208.07339
Schwartz, R., Dodge, J., Smith, N. A., & Etzioni, O. (2020). Green AI. Communications of the ACM, 63(12), 54–63. https://doi.org/10.1145/3381831
Liu, X., You, H., Zhang, Y., & Demmel, J. (2021). Dynamic neural networks for efficient inference. International Conference on Machine Learning (ICML). https://arxiv.org/abs/2102.04906
Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), 3645–3650. https://doi.org/10.18653/v1/P19-1355
Jouppi, N. P., Young, C., Patil, N., et al. (2017). In-datacenter performance analysis of a tensor processing unit. Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA), 1–12. https://doi.org/10.1145/3079856.3080246
Gupta, U., Lee, J., Na, T., et al. (2020). Efficient AI at scale with nanoscale systems. Google AI Blog. https://ai.googleblog.com/
Patterson, D., Gonzalez, J., Hölzle, U., et al. (2021). Carbon emissions and large neural network training. Nature Machine Intelligence, 3(2), 89–94. https://doi.org/10.1038/s42256-020-00297-z
Micikevicius, P., Narang, S., Alben, J., et al. (2018). Mixed precision training. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1710.03740
Rajbhandari, S., Rasley, J., Ruwase, O., & He, Y. (2022). ZeRO: Memory optimization towards training trillion parameter models. Advances in Neural Information Processing Systems (NeurIPS). https://arxiv.org/abs/1910.02054
Lacoste, A., Luccioni, A., Schmidt, V., & Dandres, T. (2019). Quantifying the carbon emissions of machine learning. NeurIPS Workshop on Tackling Climate Change with Machine Learning. https://arxiv.org/abs/1910.09700
Henderson, P., Hu, J., Romoff, J., et al. (2020). Towards environmentally sustainable AI: Challenges, opportunities, and a research agenda. Proceedings of the 2020 ACM Conference on Fairness, Accountability, and Transparency (FAccT). https://doi.org/10.1145/3351095.3372828
Google AI. (2020). Toward a more sustainable AI. Google Research Blog. https://ai.googleblog.com/
Facebook AI. (2021). Reducing the environmental impact of AI systems. Facebook AI Blog. https://ai.facebook.com/blog/
BigScience Workshop. (2022). BLOOM: A 176B parameter open-access language model. arXiv preprint. https://arxiv.org/abs/2211.05100
Black, S., et al. (2021). GPT-Neo: Large-scale autoregressive language models. EleutherAI. https://github.com/EleutherAI/gpt-neo
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT). https://doi.org/10.1145/3442188.3445922
Birhane, A., van Dijk, J., & Priya, S. (2022). The cost of AI: Environmental and social impacts. NeurIPS Workshop on Machine Learning for the Developing World. https://arxiv.org/abs/2206.11990
Northcutt, C. G., Athalye, A., & Mueller, J. (2021). Pervasive label errors in test sets destabilize machine learning benchmarks. Journal of Machine Learning Research (JMLR), 22(1), 1–48. https://jmlr.org/papers/v22/20-950.html
Dodge, J., Gururangan, S., Card, D., et al. (2020). Fine-tuning pretrained language models: Weight initializations, data orders, and early stopping. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). https://doi.org/10.18653/v1/2020.emnlp-main.522
Lewis, P., Perez, E., Piktus, A., et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems (NeurIPS). https://arxiv.org/abs/2005.11

Index Terms

Computer Science

Information Sciences

Energy Efficiency

Model Optimisation

Sustainability

Artificial Intelligence (AI)

Computational Efficiency

Keywords

Energy-Efficient Training LLM Sparse Training Adaptive Inference Hardware Acceleration