lijunfeng/melltron: 唱出来 mellotron 开源项目 demo @ ab54463a9a72d55a1a8809c9d7cb4a89c218bf8d

唱出来 mellotron 开源项目 demo

TZLD ab54463a9a master		4 سال پیش
__pycache__	ab54463a9a master	4 سال پیش
data	ab54463a9a master	4 سال پیش
filelists	ab54463a9a master	4 سال پیش
text	ab54463a9a master	4 سال پیش
venv	ab54463a9a master	4 سال پیش
waveglow	ab54463a9a master	4 سال پیش
LICENSE	ab54463a9a master	4 سال پیش
README.md	ab54463a9a master	4 سال پیش
audio_processing.py	ab54463a9a master	4 سال پیش
audio_stereo.wav	ab54463a9a master	4 سال پیش
audio_stereo_1.wav	ab54463a9a master	4 سال پیش
data_utils.py	ab54463a9a master	4 سال پیش
distributed.py	ab54463a9a master	4 سال پیش
fp16_optimizer.py	ab54463a9a master	4 سال پیش
hparams.py	ab54463a9a master	4 سال پیش
inference.ipynb	ab54463a9a master	4 سال پیش
inference.py	ab54463a9a master	4 سال پیش
layers.py	ab54463a9a master	4 سال پیش
logger.py	ab54463a9a master	4 سال پیش
loss_function.py	ab54463a9a master	4 سال پیش
loss_scaler.py	ab54463a9a master	4 سال پیش
mellotron_logo.png	ab54463a9a master	4 سال پیش
mellotron_utils.py	ab54463a9a master	4 سال پیش
model.py	ab54463a9a master	4 سال پیش
modules.py	ab54463a9a master	4 سال پیش
multiproc.py	ab54463a9a master	4 سال پیش
plotting_utils.py	ab54463a9a master	4 سال پیش
requirements.txt	ab54463a9a master	4 سال پیش
stft.py	ab54463a9a master	4 سال پیش
train.py	ab54463a9a master	4 سال پیش
utils.py	ab54463a9a master	4 سال پیش
yin.py	ab54463a9a master	4 سال پیش

Rafael Valle, Jason Li, Ryan Prenger and Bryan Catanzaro

In our recent paper we propose Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data.

By explicitly conditioning on rhythm and continuous pitch contours from an audio signal or music score, Mellotron is able to generate speech in a variety of styles ranging from read speech to expressive speech, from slow drawls to rap and from monotonous voice to singing voice.

Visit our website for audio samples.

Pre-requisites

NVIDIA GPU + CUDA cuDNN

Setup

Clone this repo: git clone https://github.com/NVIDIA/mellotron.git
CD into this repo: cd mellotron
Initialize submodule: git submodule init; git submodule update
Install PyTorch
Install Apex
Install python requirements or build docker image
- Install python requirements: pip install -r requirements.txt

Training

Update the filelists inside the filelists folder to point to your data
python train.py --output_directory=outdir --log_directory=logdir
(OPTIONAL) tensorboard --logdir=outdir/logdir

Training using a pre-trained model

Training using a pre-trained model can lead to faster convergence
By default, the speaker embedding layer is ignored

Download our published Mellotron model trained on LibriTTS or LJS
python train.py --output_directory=outdir --log_directory=logdir -c models/mellotron_libritts.pt --warm_start

Multi-GPU (distributed) and Automatic Mixed Precision Training

python -m multiproc train.py --output_directory=outdir --log_directory=logdir --hparams=distributed_run=True,fp16_run=True

Inference demo

jupyter notebook --ip=127.0.0.1 --port=31337
Load inference.ipynb
(optional) Download our published WaveGlow model

Related repos

WaveGlow Faster than real time Flow-based Generative Network for Speech Synthesis.

Acknowledgements

This implementation uses code from the following repos: Keith Ito, Prem Seetharaman, Chengqi Deng, Patrice Guyot, as described in our code.

README.md

Rafael Valle*, Jason Li*, Ryan Prenger and Bryan Catanzaro