Virtual Try-Off with Diffusion Models: A Systematic Study Toward State-of-the-Art

Sep 27, 2025ยท
Loc-Phat Truong
Loc-Phat Truong
,
Meysam Madadi
,
Sergio Escalera
ยท 0 min read
Abstract
Virtual Try-On (VTON) has rapidly advanced in recent years, bringing astonishing improvements in both accuracy and realism. In contrast, its inverse problem - Virtual Try-Off (VTOFF) - has just started gaining traction. VTOFF aims to reconstruct the canonical garment from its worn version in a person image. It need to generate garment to the new pose with accurate preservation of texture. However, it also requires to reconstruct occluded details, making adapting VTON methods to VTOFF non-trivial. In this work, we present a systematic study of diffusion-based approaches for VTOFF, experimenting various VTON and general Latent Diffusion Model achievements, including Dual-UNet Diffusion Model architecture and different techniques. Our experiments cover different axes of design: (i) comparing Stable Diffusion variants for generation network; (ii) conditioning inputs, including ablations on different mask designs, masked/unmasked inputs for IP-Adapter, and high-level semantic features; (iii) losses and training strategies, with the auxiliary attention-based loss, curriculum schedules, and the impact of perceptual objectives. Extensive experiments on the VITON-HD dataset reveal trade-offs across various options. Our framework achieves state-of-the-art performance with a drop of 9.5% on the primary metric DISTS and competitive performance on LPIPS, FID, KID, and SSIM, providing both stronger baselines and insights for future Virtual Try-Off research.
Type
Publication
Manuscript under review (2025)