Flux-Sculptor: Text-Driven Rich-Attribute Portrait Editing through Decomposed Spatial Flow Control

MY ALT TEXT

Abstract

Text-driven portrait editing holds significant potential for various applications but also presents considerable challenges. An ideal text-driven portrait editing approach should achieve precise localization and appropriate content modification, yet existing methods struggle to balance reconstruction fidelity and editing flexibility. To address this issue, we propose Flux-Sculptor, a flux-based framework designed for precise text-driven portrait editing. Our framework introduces a Prompt-Aligned Spatial Locator (PASL) to accurately identify relevant editing regions and a Structure-to-Detail Edit Control (S2D-EC) strategy to spatially guide the denoising process through sequential mask-guided fusion of latent representations and attention values. Extensive experiments demonstrate that Flux-Sculptor surpasses existing methods in rich-attribute editing and facial information preservation, making it a strong candidate for practical portrait editing applications.

Text-Based Portrait Editing Results of Flux-Sculptor

Flux-Sculptor can faithfully follow the text prompts to achieve diverse facial attribute modifications, while maintaining strong visual harmony, naturalness, and identity preservation. 10 text-driven portrait editing results are shown below.

Comparison with State-of-the-Art Editing Methods

Flux-Sculptor outperforms GAN, Diffusion and Rectified Flow-based competitors on both editing and preservation-related metrics.

Quantitative Performance

EVALUATION
Qualitative Visualization

Extra Highlights

Flux-Sculptor possesses extra highlights: 1) Open-set text-driven facial localization; 2) Flexible gender-biased attribute editing; 3) Customized mask-guided portrait editing; 4) Multi-attribute portrait editing.

MY ALT TEXT