Files
Abstract
Diffusion models have made their mark in image synthesis by excelling in visual quality and flexibility. Theyuse additional negative prompts with classifier-free guidance (CFG), which guides the model in generating
images aligned with the user’s intent. However, CFG mandates the model to run twice, making it difficult
to interpret the impact of the negative prompt in the final image. My study proposes a method to generate
a single prompt producing on-par quality as the two prompts/passes CFG. I have prepared a prompt-
to-image dataset, used per-image optimization to find the ground truth single merged prompt for each
image, and trained a neural network module to predict the embedding of that prompt. During inference
time, my model generates a single prompt with a single diffusion pass, achieving up to 2x speedup and 20%
memory reduction. My research contributes to developing more efficient diffusion models and deeply
understanding their characteristics.