!!! warning
We assume no responsibility for any illegal use of the codebase. Please refer to the local laws regarding DMCA (Digital Millennium Copyright Act) and other relevant laws in your area. <br/>
This codebase and all models are released under the CC-BY-NC-SA-4.0 license.
Professional Windows users may consider using WSL2 or Docker to run the codebase.
# Create a python 3.10 virtual environment, you can also use virtualenv
conda create -n fish-speech python=3.10
conda activate fish-speech
# Install pytorch
pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121
# Install fish-speech
pip3 install -e .
# (Enable acceleration) Install triton-windows
pip install https://github.com/AnyaCoder/fish-speech/releases/download/v0.1.0/triton_windows-0.1.0-py3-none-any.whl
Non-professional Windows users can consider the following basic methods to run the project without a Linux environment (with model compilation capabilities, i.e., torch.compile):
install_env.bat to install the environment.LLVM-17.0.6-win64.exe, double-click to install, select an appropriate installation location, and most importantly, check the Add Path to Current User option to add the environment variable.Modify button and find the Desktop development with C++ option to select and download.start.bat to open the training inference WebUI management interface. If needed, you can modify the API_FLAGS as prompted below.!!! info "Optional"
Want to start the inference WebUI?
Edit the `API_FLAGS.txt` file in the project root directory and modify the first three lines as follows:
```
--infer
# --api
# --listen ...
...
```
!!! info "Optional"
Want to start the API server?
Edit the `API_FLAGS.txt` file in the project root directory and modify the first three lines as follows:
```
# --infer
--api
--listen ...
...
```
!!! info "Optional"
Double-click `run_cmd.bat` to enter the conda/python command line environment of this project.
See pyproject.toml for details.
# Create a python 3.10 virtual environment, you can also use virtualenv
conda create -n fish-speech python=3.10
conda activate fish-speech
# Install pytorch
pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1
# (Ubuntu / Debian User) Install sox + ffmpeg
apt install libsox-dev ffmpeg
# (Ubuntu / Debian User) Install pyaudio
apt install build-essential \
cmake \
libasound-dev \
portaudio19-dev \
libportaudio2 \
libportaudiocpp0
# Install fish-speech
pip3 install -e .[stable]
If you want to perform inference on MPS, please add the --device mps flag.
Please refer to this PR for a comparison of inference speeds.
!!! warning
The `compile` option is not officially supported on Apple Silicon devices, so there is no guarantee that inference speed will improve.
# create a python 3.10 virtual environment, you can also use virtualenv
conda create -n fish-speech python=3.10
conda activate fish-speech
# install pytorch
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1
# install fish-speech
pip install -e .[stable]
Install NVIDIA Container Toolkit:
To use GPU for model training and inference in Docker, you need to install NVIDIA Container Toolkit:
For Ubuntu users:
# Add repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
# Install nvidia-container-toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
# Restart Docker service
sudo systemctl restart docker
For users of other Linux distributions, please refer to: NVIDIA Container Toolkit Install-guide.
# Pull the image
docker pull fishaudio/fish-speech:latest-dev
# Run the image
docker run -it \
--name fish-speech \
--gpus all \
-p 7860:7860 \
fishaudio/fish-speech:latest-dev \
zsh
# If you need to use a different port, please modify the -p parameter to YourPort:7860
Download model dependencies
Make sure you are in the terminal inside the docker container, then download the required vqgan and llama models from our huggingface repository.
huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-speech-1.4
In the terminal inside the docker container, enter export GRADIO_SERVER_NAME="0.0.0.0" to allow external access to the gradio service inside docker.
Then in the terminal inside the docker container, enter python tools/webui.py to start the WebUI service.
If you're using WSL or MacOS, visit http://localhost:7860 to open the WebUI interface.
If it's deployed on a server, replace localhost with your server's IP.
lora fine-tuning support.gradient checkpointing, causual sampling, and flash-attn support.text2semantic model, supporting phoneme-free mode.