Text-to-Image Synthesis with Stable Diffusion: Evaluation and Performance Analysis

Mehek Richharia; Aryan Gupta

Research Article

Text-to-Image Synthesis with Stable Diffusion: Evaluation and Performance Analysis

by Mehek Richharia, Aryan Gupta

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 187 - Issue 22

Published: July 2025

Authors: Mehek Richharia, Aryan Gupta

10.5120/ijca2025925353

PDF

Mehek Richharia, Aryan Gupta . Text-to-Image Synthesis with Stable Diffusion: Evaluation and Performance Analysis. International Journal of Computer Applications. 187, 22 (July 2025), 23-30. DOI=10.5120/ijca2025925353

                        @article{ 10.5120/ijca2025925353,
                        author  = { Mehek Richharia,Aryan Gupta },
                        title   = { Text-to-Image Synthesis with Stable Diffusion: Evaluation and Performance Analysis },
                        journal = { International Journal of Computer Applications },
                        year    = { 2025 },
                        volume  = { 187 },
                        number  = { 22 },
                        pages   = { 23-30 },
                        doi     = { 10.5120/ijca2025925353 },
                        publisher = { Foundation of Computer Science (FCS), NY, USA }
                        }

                        %0 Journal Article
                        %D 2025
                        %A Mehek Richharia
                        %A Aryan Gupta
                        %T Text-to-Image Synthesis with Stable Diffusion: Evaluation and Performance Analysis%T 
                        %J International Journal of Computer Applications
                        %V 187
                        %N 22
                        %P 23-30
                        %R 10.5120/ijca2025925353
                        %I Foundation of Computer Science (FCS), NY, USA

Abstract

Recent progress in machine learning, especially in imaging, has led to success in generating high-quality images from text descriptions. Among these advances, the widespread adoption of the face stands out for enhancing the model's strength, flexibility, and ability to produce realistic and diverse images. Unlike traditional generative models such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), which often face issues like training instability and mode collapse, diffusion models offer a more stable structure for image generation. These models benefit from principles of diffusion processes, which iteratively transform random noise into coherent images, resulting in improved performance and reliability. This paper provides a comprehensive review of the latest version of Stable Diffusion, focusing on its revolutionary architecture, core principles, and practical applications. The study compares Stable Diffusion with other leading generative models in terms of image quality, stability, and computational efficiency. It also highlights Hugging Face's role in democratizing AI-driven image generation by making Stable Diffusion accessible through open-source platforms, enabling researchers, developers, and enthusiasts to customize and enhance the model for a wide range of innovative and practical applications. The overview further considers the broader implications of diffusion models in AI-driven creativity, especially in fields such as art, design, advertising, and entertainment. By analyzing the strengths and limitations of Stable Diffusion, the paper aims to offer insights into its potential to influence the future of image generation technology. Additionally, it addresses existing challenges, including the need for greater diversity in generated images and reductions in computational costs. This paper serves as a valuable resource for researchers and practitioners interested in the evolving landscape of text-to-image synthesis and the transformative potential of diffusion models in artificial intelligence.

References

M. Elasri, O. Elharrouss, S. Al-Maadeed, and H. Tairi, “Image generation: A review,” Neural Processing Letters, Feb. 2022. [Online]. Available: https://doi.org/10.1007/s11063-022-10777-x
S. Ramzan, M. M. Iqbal, and T. Kalsum, “Text-to-image generation using deep learning,” Engineering Proceedings, vol. 20, p. 16, Jul. 2022. [Online]. Available: https://doi.org/10.3390/engproc2022020016
A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, “Zero-shot text-to-image generation,” arXiv preprint arXiv:2102.12092, Feb. 2021. [Online]. Available: https://arxiv.org/abs/2102.12092
Youngsun Lim, Hojun Choi, Hyunjung Shim, “I-HallA: Evaluating Image Hallucination in Text-to-Image Generation with Question Answering,” arXiv preprint arXiv:2409.12784, Jul. 2024. [Online]. Available: https://arxiv.org/abs/2409.12784
Zhongjie Duan, Qianyi Zhao, Cen Chen, Daoyuan Chen, Wenmeng Zhou, Yaliang Li, Yingda Chen, “ArtAug: Enhancing Text-to-Image Generation through Synthesis-Understanding Interaction,” arXiv preprint arXiv:2412.12888v2, Dec. 2024. [Online]. Available: https://arxiv.org/abs/2412.12888
Ling Yang, Zhilong Zhang, Zhaochen Yu, Jingwei Liu, Minkai Xu, Stefano Ermon, Bin Cui, “ContextDiff: Contextualized Diffusion Model for Text-to-Image and Text-to-Video Generation,” arXiv preprint arXiv:2402.16627v3, Feb. 2024. [Online]. Available: https://arxiv.org/abs/2402.16627
Jiatao Gu, Yuyang Wang, Yizhe Zhang, Qihang Zhang, Dinghuai Zhang, Navdeep Jaitly, Josh Susskind, Shuangfei Zhai, “DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation,” arXiv preprint arXiv:2410.08159v2, Oct. 2024. [Online]. Available: https://arxiv.org/abs/2410.08159
Wendi Zheng, Jiayan Teng, Zhuoyi Yang, Weihan Wang, Jidong Chen, Xiaotao Gu, Yuxiao Dong, Ming Ding, and Jie Tang, “ComposeAnything: Compositional Text-to-Image Generation with Layout Reasoning,” arXiv preprint arXiv:2403.05121v1, Mar. 2024. [Online]. Available: https://arxiv.org/abs/2403.05121
Zeeshan Khan Shizhe Chen Cordelia Schmid, “Commonsense-T2I: A Benchmark for Evaluating Commonsense Reasoning in Text-to-Image Models,” arXiv preprint arXiv:2505.24086v1, May 2025. [Online]. Available: https://arxiv.org/abs/2505.24086
Junsong Chen1,2,4, Yue Wu1, Simian Luo3, Enze Xie1†, Sayak Paul5, Ping Luo4, Hang Zhao3, Zhenguo Li1, “PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models,” arXiv preprint arXiv:2401.05252v1, Jan. 2024. [Online]. Available: https://arxiv.org/abs/2401.05252
Ling Yang1∗† Zhilong Zhang1∗ Zhaochen Yu1∗ Jingwei Liu1 Minkai Xu2 Stefano Ermon2 Bin Cui1†, “ContextDiff: Contextualized Diffusion Model for Text-to-Image and Text-to-Video Generation,” arXiv preprint arXiv:2402.16627v3, Feb. 2024. [Online]. Available: https://arxiv.org/abs/2402.16627

Index Terms

Computer Science

Information Sciences

No index terms available.

Keywords

Text-to-Image Dall-E Stable Diffusion Image Generation