Tag: d-VAE

A picture is worth a thousand words

AI 作图领域，存在如下两种不同的思路：以 Stable Diffusion 为首的扩散模型，使用一系列的 denoising step 将一张随机噪声图片逐步在 CLIP 引导下转换为成品图。把图像切割为 token，以 token 为单位自回归式生成图像。”自回归” 这一块与语言模型原理相同。用人类作图类比，Stable…

November 23, 2025