🎙️ DMOSpeech 2: Zero-Shot Text-to-Speech

Generate natural speech in any voice with just a short reference audio!

📎 Reference Audio

📝 Reference Text (leave empty for auto-transcription)

✍️ Text to Generate

🚀 Generation Mode

Choose speed vs quality/diversity tradeoff

Student Only (4 steps) Teacher-Guided (8 steps) High Diversity (16 steps) Custom

🔊 Generated Speech

💡 Quick Tips:

Auto-transcription: Leave reference text empty to auto-transcribe
Student Only: Fastest (4 steps), good quality
Teacher-Guided: Best balance (8 steps), recommended
High Diversity: More natural prosody (16 steps)
Custom Mode: Fine-tune all parameters

📊 Expected RTF (Real-Time Factor):

Student Only: ~0.05x (20x faster than real-time)
Teacher-Guided: ~0.10x (10x faster)
High Diversity: ~0.20x (5x faster)