Spherical flow matching
Building a spherical support — token projection, decoder/discriminator finetuning, and drawing noise on the
same sphere — already closes most of the gap with vanilla linear flow matching. Replacing the chord with the
slerp geodesic on top gives the best observed trajectory. The diffusion architecture and conditioning stay
unchanged.
1
Project tokens to a fixed radius.
\(z_{i,j} \leftarrow \sqrt d \cdot z_{i,j} / \|z_{i,j}\|\). The encoder stays frozen; the decoder and
discriminator are finetuned to decode projected latents.
2
Sample noise on the same sphere.
Draw \(\epsilon \sim \mathcal N(0, I_d)\), then use \(z_0 = \sqrt d\,\epsilon/\|\epsilon\|\). This keeps
the angular part of the Gaussian prior.
3
Train along slerp.
Replace the chord with the geodesic arc and project the predicted velocity onto the tangent space;
integrate with the exponential map at inference so samples stay on the sphere. SiT architecture and
conditioning stay unchanged — no auxiliary encoder, no representation-alignment objective.