๐ŸŽ™๏ธ DMOSpeech 2: Zero-Shot Text-to-Speech

GitHub Repo

Generate natural speech in any voice with just a short reference audio!

๐Ÿš€ Generation Mode

Choose speed vs quality/diversity tradeoff

0 2

Show detailed generation steps

๐Ÿ’ก Quick Tips:

  • Auto-transcription: Leave reference text empty to auto-transcribe
  • Student Only: Fastest (4 steps), good quality
  • Teacher-Guided: Best balance (8 steps), recommended
  • High Diversity: More natural prosody (16 steps)
  • Custom Mode: Fine-tune all parameters

๐Ÿ“Š Expected RTF (Real-Time Factor):

  • Student Only: ~0.05x (20x faster than real-time)
  • Teacher-Guided: ~0.10x (10x faster)
  • High Diversity: ~0.20x (5x faster)