Research Article

Text-to-Image Synthesis with Stable Diffusion: Evaluation and Performance Analysis

by  Mehek Richharia, Aryan Gupta
journal cover
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Issue 22
Published: July 2025
Authors: Mehek Richharia, Aryan Gupta
10.5120/ijca2025925353
PDF

Mehek Richharia, Aryan Gupta . Text-to-Image Synthesis with Stable Diffusion: Evaluation and Performance Analysis. International Journal of Computer Applications. 187, 22 (July 2025), 23-30. DOI=10.5120/ijca2025925353

                        @article{ 10.5120/ijca2025925353,
                        author  = { Mehek Richharia,Aryan Gupta },
                        title   = { Text-to-Image Synthesis with Stable Diffusion: Evaluation and Performance Analysis },
                        journal = { International Journal of Computer Applications },
                        year    = { 2025 },
                        volume  = { 187 },
                        number  = { 22 },
                        pages   = { 23-30 },
                        doi     = { 10.5120/ijca2025925353 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }
                        %0 Journal Article
                        %D 2025
                        %A Mehek Richharia
                        %A Aryan Gupta
                        %T Text-to-Image Synthesis with Stable Diffusion: Evaluation and Performance Analysis%T 
                        %J International Journal of Computer Applications
                        %V 187
                        %N 22
                        %P 23-30
                        %R 10.5120/ijca2025925353
                        %I Foundation of Computer Science (FCS), NY, USA
Abstract

Recent progress in machine learning, especially in imaging, has led to success in generating high-quality images from text descriptions. Among these advances, the widespread adoption of the face stands out for enhancing the model's strength, flexibility, and ability to produce realistic and diverse images. Unlike traditional generative models such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), which often face issues like training instability and mode collapse, diffusion models offer a more stable structure for image generation. These models benefit from principles of diffusion processes, which iteratively transform random noise into coherent images, resulting in improved performance and reliability. This paper provides a comprehensive review of the latest version of Stable Diffusion, focusing on its revolutionary architecture, core principles, and practical applications. The study compares Stable Diffusion with other leading generative models in terms of image quality, stability, and computational efficiency. It also highlights Hugging Face's role in democratizing AI-driven image generation by making Stable Diffusion accessible through open-source platforms, enabling researchers, developers, and enthusiasts to customize and enhance the model for a wide range of innovative and practical applications. The overview further considers the broader implications of diffusion models in AI-driven creativity, especially in fields such as art, design, advertising, and entertainment. By analyzing the strengths and limitations of Stable Diffusion, the paper aims to offer insights into its potential to influence the future of image generation technology. Additionally, it addresses existing challenges, including the need for greater diversity in generated images and reductions in computational costs. This paper serves as a valuable resource for researchers and practitioners interested in the evolving landscape of text-to-image synthesis and the transformative potential of diffusion models in artificial intelligence.

References
  • M. Elasri, O. Elharrouss, S. Al-Maadeed, and H. Tairi, “Image generation: A review,” Neural Processing Letters, Feb. 2022. [Online]. Available: https://doi.org/10.1007/s11063-022-10777-x
  • S. Ramzan, M. M. Iqbal, and T. Kalsum, “Text-to-image generation using deep learning,” Engineering Proceedings, vol. 20, p. 16, Jul. 2022. [Online]. Available: https://doi.org/10.3390/engproc2022020016
  • A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, “Zero-shot text-to-image generation,” arXiv preprint arXiv:2102.12092, Feb. 2021. [Online]. Available: https://arxiv.org/abs/2102.12092
  • Youngsun Lim, Hojun Choi, Hyunjung Shim, “I-HallA: Evaluating Image Hallucination in Text-to-Image Generation with Question Answering,” arXiv preprint arXiv:2409.12784, Jul. 2024. [Online]. Available: https://arxiv.org/abs/2409.12784
  • Zhongjie Duan, Qianyi Zhao, Cen Chen, Daoyuan Chen, Wenmeng Zhou, Yaliang Li, Yingda Chen, “ArtAug: Enhancing Text-to-Image Generation through Synthesis-Understanding Interaction,” arXiv preprint arXiv:2412.12888v2, Dec. 2024. [Online]. Available: https://arxiv.org/abs/2412.12888
  • Ling Yang, Zhilong Zhang, Zhaochen Yu, Jingwei Liu, Minkai Xu, Stefano Ermon, Bin Cui, “ContextDiff: Contextualized Diffusion Model for Text-to-Image and Text-to-Video Generation,” arXiv preprint arXiv:2402.16627v3, Feb. 2024. [Online]. Available: https://arxiv.org/abs/2402.16627
  • Jiatao Gu, Yuyang Wang, Yizhe Zhang, Qihang Zhang, Dinghuai Zhang, Navdeep Jaitly, Josh Susskind, Shuangfei Zhai, “DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation,” arXiv preprint arXiv:2410.08159v2, Oct. 2024. [Online]. Available: https://arxiv.org/abs/2410.08159
  • Wendi Zheng, Jiayan Teng, Zhuoyi Yang, Weihan Wang, Jidong Chen, Xiaotao Gu, Yuxiao Dong, Ming Ding, and Jie Tang, “ComposeAnything: Compositional Text-to-Image Generation with Layout Reasoning,” arXiv preprint arXiv:2403.05121v1, Mar. 2024. [Online]. Available: https://arxiv.org/abs/2403.05121
  • Zeeshan Khan Shizhe Chen Cordelia Schmid, “Commonsense-T2I: A Benchmark for Evaluating Commonsense Reasoning in Text-to-Image Models,” arXiv preprint arXiv:2505.24086v1, May 2025. [Online]. Available: https://arxiv.org/abs/2505.24086
  • Junsong Chen1,2,4, Yue Wu1, Simian Luo3, Enze Xie1†, Sayak Paul5, Ping Luo4, Hang Zhao3, Zhenguo Li1, “PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models,” arXiv preprint arXiv:2401.05252v1, Jan. 2024. [Online]. Available: https://arxiv.org/abs/2401.05252
  • Ling Yang1∗† Zhilong Zhang1∗ Zhaochen Yu1∗ Jingwei Liu1 Minkai Xu2 Stefano Ermon2 Bin Cui1†, “ContextDiff: Contextualized Diffusion Model for Text-to-Image and Text-to-Video Generation,” arXiv preprint arXiv:2402.16627v3, Feb. 2024. [Online]. Available: https://arxiv.org/abs/2402.16627
Index Terms
Computer Science
Information Sciences
No index terms available.
Keywords

Text-to-Image Dall-E Stable Diffusion Image Generation

Powered by PhDFocusTM