How to synthesize speech with Qwen3-TTS and build word-level alignment data with Qwen3-ForcedAligner.
Using Python for direct Kokoro timing data, and a simple chunk-duration approximation for JavaScript when the public API only exposes text and audio.