PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering

Yifan Gao*,1,2, Zihang Lin*,1, Chuanbin Liu†,2, Min Zhou1, Tiezheng Ge1, Bo Zheng1, Hongtao Xie2,
1 Taobao & Tmall Group of Alibaba, 2 University of Science and Technology of China
CVPR 2025

*Indicates Equal Contribution, Indicates Corresponding author

Directional Weight Score
MY ALT TEXT

(a) Definition of the advertising product poster generation task. The input includes the prompt, subject image, and the texts to be rendered with their layouts. The output is the poster image. (b) The comparison of our method with the previous method. PosterMaker generates posters end-to-end, while previous methods first generate poster backgrounds and then render texts. (c) Visualization results demonstrate that PosterMaker can generate harmonious and aesthetically pleasing posters with accurate texts and maintain subject fidelity.

Abstract

Product posters, which integrate subject, scene, and text, are crucial promotional tools for attracting customers. Creating such posters using modern image generation methods is valuable, while the main challenge lies in accurately rendering text, especially for complex writing systems like Chinese, which contains over 10,000 individual characters. In this work, we identify the key to precise text rendering as constructing a character-discriminative visual feature as a control signal. Based on this insight, we propose a robust character-wise representation as control and we develop TextRenderNet, which achieves a high text rendering accuracy of over 90%. Another challenge in poster generation is maintaining the fidelity of user-specific products. We address this by introducing SceneGenNet, an inpainting-based model, and propose subject fidelity feedback learning to further enhance fidelity. Based on TextRenderNet and SceneGenNet, we present PosterMaker, an end-to-end generation framework. To optimize PosterMaker efficiently, we implement a two-stage training strategy that decouples text rendering and background generation learning. Experimental results show that PosterMaker outperforms existing baselines by a remarkable margin, which demonstrates its effectiveness.

Poster Generation Challenges

Directional Weight Score

The illustration of the three challenges faced by poster generation, including (a)poor text rendering, (b)text-scence disharmony, and foreground extension, which seriously hinder the practical application.

Our Method

Directional Weight Score

(a) We present the overall framework of PosterMaker (b) We present our proposed two-stage decoupling training (c) We present the proposed character level visual text representation, which is crucial for achieving high-precision text rendering and does not require additional encoders for training and inference, significantly improving efficiency.

Comparison with Baselines

Directional Weight Score

Comparison of Text Representations for Text Rendering

Directional Weight Score

BibTeX

@misc{gao2025postermakerhighqualityproductposter,
          title={PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering}, 
          author={Yifan Gao and Zihang Lin and Chuanbin Liu and Min Zhou and Tiezheng Ge and Bo Zheng and Hongtao Xie},
          year={2025},
          eprint={2504.06632},
          archivePrefix={arXiv},
          primaryClass={cs.CV},
          url={https://arxiv.org/abs/2504.06632},
        }