Explorar el Código

Fix errors in mkdocs.yml and ipynb (#988)

PoTaTo hace 10 meses
padre
commit
3d31a80ad1
Se han modificado 18 ficheros con 723 adiciones y 723 borrados
  1. 35 93
      docs/en/index.md
  2. 108 0
      docs/en/inference.md
  3. 0 50
      docs/en/install.md
  4. 35 92
      docs/ja/index.md
  5. 107 0
      docs/ja/inference.md
  6. 0 50
      docs/ja/install.md
  7. 35 92
      docs/ko/index.md
  8. 107 0
      docs/ko/inference.md
  9. 0 50
      docs/ko/install.md
  10. 35 92
      docs/pt/index.md
  11. 107 0
      docs/pt/inference.md
  12. 0 50
      docs/pt/install.md
  13. 35 92
      docs/zh/index.md
  14. 107 0
      docs/zh/inference.md
  15. 0 50
      docs/zh/install.md
  16. 6 6
      inference.ipynb
  17. 5 5
      mkdocs.yml
  18. 1 1
      pyproject.toml

+ 35 - 93
docs/en/index.md

@@ -1,108 +1,50 @@
-# Inference
+# Introduction
 
-As the vocoder model has been changed, you need more VRAM than before, 12GB is recommended for fluently inference.
+<div>
+<a target="_blank" href="https://discord.gg/Es5qTB9BcN">
+<img alt="Discord" src="https://img.shields.io/discord/1214047546020728892?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"/>
+</a>
+<a target="_blank" href="http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=jCKlUP7QgSm9kh95UlBoYv6s1I-Apl1M&authKey=xI5ttVAp3do68IpEYEalwXSYZFdfxZSkah%2BctF5FIMyN2NqAa003vFtLqJyAVRfF&noverify=0&group_code=593946093">
+<img alt="QQ" src="https://img.shields.io/badge/QQ Group-%2312B7F5?logo=tencent-qq&logoColor=white&style=flat-square"/>
+</a>
+<a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
+<img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
+</a>
+</div>
 
-We support command line, HTTP API and WebUI for inference, you can choose any method you like.
+!!! warning
+    We assume no responsibility for any illegal use of the codebase. Please refer to the local laws regarding DMCA (Digital Millennium Copyright Act) and other relevant laws in your area. <br/>
+    This codebase is released under Apache 2.0 license and all models are released under the CC-BY-NC-SA-4.0 license.
 
-## Download Weights
+## Requirements
 
-First you need to download the model weights:
+- GPU Memory: 12GB (Inference)
+- System: Linux, Windows
 
-```bash
-huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
-```
-
-## Command Line Inference
+## Setup
 
-!!! note
-    If you plan to let the model randomly choose a voice timbre, you can skip this step.
-
-### 1. Get VQ tokens from reference audio
+First, we need to create a conda environment to install the packages.
 
 ```bash
-python fish_speech/models/dac/inference.py \
-    -i "ref_audio_name.wav" \
-    --checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
-```
 
-You should get a `fake.npy` and a `fake.wav`.
+conda create -n fish-speech python=3.12
+conda activate fish-speech
 
-### 2. Generate semantic tokens from text:
+pip install sudo apt-get install portaudio19-dev # For pyaudio
+pip install -e . # This will download all rest packages.
 
-```bash
-python fish_speech/models/text2semantic/inference.py \
-    --text "The text you want to convert" \
-    --prompt-text "Your reference text" \
-    --prompt-tokens "fake.npy" \
-    --checkpoint-path "checkpoints/openaudio-s1-mini" \
-    --num-samples 2 \
-    --compile # if you want a faster speed
+apt install libsox-dev ffmpeg # If needed.
 ```
 
-This command will create a `codes_N` file in the working directory, where N is an integer starting from 0.
-
-!!! note
-    You may want to use `--compile` to fuse CUDA kernels for faster inference (~30 tokens/second -> ~500 tokens/second).
-    Correspondingly, if you do not plan to use acceleration, you can comment out the `--compile` parameter.
-
-!!! info
-    For GPUs that do not support bf16, you may need to use the `--half` parameter.
-
-### 3. Generate vocals from semantic tokens:
-
-#### VQGAN Decoder
-
-!!! warning "Future Warning"
-    We have kept the interface accessible from the original path (tools/vqgan/inference.py), but this interface may be removed in subsequent releases, so please change your code as soon as possible.
-
-```bash
-python fish_speech/models/dac/inference.py \
-    -i "codes_0.npy" \
-    --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
-```
-
-## HTTP API Inference
-
-We provide a HTTP API for inference. You can use the following command to start the server:
-
-```bash
-python -m tools.api_server \
-    --listen 0.0.0.0:8080 \
-    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
-    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
-    --decoder-config-name modded_dac_vq
-```
-
-> If you want to speed up inference, you can add the `--compile` parameter.
-
-After that, you can view and test the API at http://127.0.0.1:8080/.
-
-## GUI Inference 
-[Download client](https://github.com/AnyaCoder/fish-speech-gui/releases)
-
-## WebUI Inference
-
-You can start the WebUI using the following command:
-
-```bash
-python -m tools.run_webui \
-    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
-    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
-    --decoder-config-name modded_dac_vq
-```
-
-Or simply
-
-```bash
-python -m tools.run_webui
-```
-> If you want to speed up inference, you can add the `--compile` parameter.
-
-
-!!! note
-    You can save the label file and reference audio file in advance to the `references` folder in the main directory (which you need to create yourself), so that you can directly call them in the WebUI.
+!!! warning
+    The `compile` option is not supported on windows and macOS, if you want to run with compile, you need to install trition by yourself.
 
-!!! note
-    You can use Gradio environment variables, such as `GRADIO_SHARE`, `GRADIO_SERVER_PORT`, `GRADIO_SERVER_NAME` to configure WebUI.
+## Acknowledgements
 
-Enjoy!
+- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
+- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
+- [GPT VITS](https://github.com/innnky/gpt-vits)
+- [MQTTS](https://github.com/b04901014/MQTTS)
+- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
+- [Transformers](https://github.com/huggingface/transformers)
+- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)

+ 108 - 0
docs/en/inference.md

@@ -0,0 +1,108 @@
+# Inference
+
+As the vocoder model has been changed, you need more VRAM than before, 12GB is recommended for fluently inference.
+
+We support command line, HTTP API and WebUI for inference, you can choose any method you like.
+
+## Download Weights
+
+First you need to download the model weights:
+
+```bash
+huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
+```
+
+## Command Line Inference
+
+!!! note
+    If you plan to let the model randomly choose a voice timbre, you can skip this step.
+
+### 1. Get VQ tokens from reference audio
+
+```bash
+python fish_speech/models/dac/inference.py \
+    -i "ref_audio_name.wav" \
+    --checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
+```
+
+You should get a `fake.npy` and a `fake.wav`.
+
+### 2. Generate semantic tokens from text:
+
+```bash
+python fish_speech/models/text2semantic/inference.py \
+    --text "The text you want to convert" \
+    --prompt-text "Your reference text" \
+    --prompt-tokens "fake.npy" \
+    --checkpoint-path "checkpoints/openaudio-s1-mini" \
+    --num-samples 2 \
+    --compile # if you want a faster speed
+```
+
+This command will create a `codes_N` file in the working directory, where N is an integer starting from 0.
+
+!!! note
+    You may want to use `--compile` to fuse CUDA kernels for faster inference (~30 tokens/second -> ~500 tokens/second).
+    Correspondingly, if you do not plan to use acceleration, you can comment out the `--compile` parameter.
+
+!!! info
+    For GPUs that do not support bf16, you may need to use the `--half` parameter.
+
+### 3. Generate vocals from semantic tokens:
+
+#### VQGAN Decoder
+
+!!! warning "Future Warning"
+    We have kept the interface accessible from the original path (tools/vqgan/inference.py), but this interface may be removed in subsequent releases, so please change your code as soon as possible.
+
+```bash
+python fish_speech/models/dac/inference.py \
+    -i "codes_0.npy" \
+    --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
+```
+
+## HTTP API Inference
+
+We provide a HTTP API for inference. You can use the following command to start the server:
+
+```bash
+python -m tools.api_server \
+    --listen 0.0.0.0:8080 \
+    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
+    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
+    --decoder-config-name modded_dac_vq
+```
+
+> If you want to speed up inference, you can add the `--compile` parameter.
+
+After that, you can view and test the API at http://127.0.0.1:8080/.
+
+## GUI Inference 
+[Download client](https://github.com/AnyaCoder/fish-speech-gui/releases)
+
+## WebUI Inference
+
+You can start the WebUI using the following command:
+
+```bash
+python -m tools.run_webui \
+    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
+    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
+    --decoder-config-name modded_dac_vq
+```
+
+Or simply
+
+```bash
+python -m tools.run_webui
+```
+> If you want to speed up inference, you can add the `--compile` parameter.
+
+
+!!! note
+    You can save the label file and reference audio file in advance to the `references` folder in the main directory (which you need to create yourself), so that you can directly call them in the WebUI.
+
+!!! note
+    You can use Gradio environment variables, such as `GRADIO_SHARE`, `GRADIO_SERVER_PORT`, `GRADIO_SERVER_NAME` to configure WebUI.
+
+Enjoy!

+ 0 - 50
docs/en/install.md

@@ -1,50 +0,0 @@
-# Introduction
-
-<div>
-<a target="_blank" href="https://discord.gg/Es5qTB9BcN">
-<img alt="Discord" src="https://img.shields.io/discord/1214047546020728892?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"/>
-</a>
-<a target="_blank" href="http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=jCKlUP7QgSm9kh95UlBoYv6s1I-Apl1M&authKey=xI5ttVAp3do68IpEYEalwXSYZFdfxZSkah%2BctF5FIMyN2NqAa003vFtLqJyAVRfF&noverify=0&group_code=593946093">
-<img alt="QQ" src="https://img.shields.io/badge/QQ Group-%2312B7F5?logo=tencent-qq&logoColor=white&style=flat-square"/>
-</a>
-<a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
-<img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
-</a>
-</div>
-
-!!! warning
-    We assume no responsibility for any illegal use of the codebase. Please refer to the local laws regarding DMCA (Digital Millennium Copyright Act) and other relevant laws in your area. <br/>
-    This codebase is released under Apache 2.0 license and all models are released under the CC-BY-NC-SA-4.0 license.
-
-## Requirements
-
-- GPU Memory: 12GB (Inference)
-- System: Linux, Windows
-
-## Setup
-
-First, we need to create a conda environment to install the packages.
-
-```bash
-
-conda create -n fish-speech python=3.12
-conda activate fish-speech
-
-pip install sudo apt-get install portaudio19-dev # For pyaudio
-pip install -e . # This will download all rest packages.
-
-apt install libsox-dev ffmpeg # If needed.
-```
-
-!!! warning
-    The `compile` option is not supported on windows and macOS, if you want to run with compile, you need to install trition by yourself.
-
-## Acknowledgements
-
-- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
-- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
-- [GPT VITS](https://github.com/innnky/gpt-vits)
-- [MQTTS](https://github.com/b04901014/MQTTS)
-- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
-- [Transformers](https://github.com/huggingface/transformers)
-- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)

+ 35 - 92
docs/ja/index.md

@@ -1,107 +1,50 @@
-# 推論
+# 紹介
 
-ボコーダーモデルが変更されたため、以前よりも多くのVRAMが必要です。スムーズな推論には12GBを推奨します。
+<div>
+<a target="_blank" href="https://discord.gg/Es5qTB9BcN">
+<img alt="Discord" src="https://img.shields.io/discord/1214047546020728892?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"/>
+</a>
+<a target="_blank" href="http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=jCKlUP7QgSm9kh95UlBoYv6s1I-Apl1M&authKey=xI5ttVAp3do68IpEYEalwXSYZFdfxZSkah%2BctF5FIMyN2NqAa003vFtLqJyAVRfF&noverify=0&group_code=593946093">
+<img alt="QQ" src="https://img.shields.io/badge/QQ Group-%2312B7F5?logo=tencent-qq&logoColor=white&style=flat-square"/>
+</a>
+<a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
+<img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
+</a>
+</div>
 
-推論には、コマンドライン、HTTP API、WebUIをサポートしており、お好きな方法を選択できます。
+!!! warning
+    このコードベースの違法な使用について、当方は一切の責任を負いません。お住まいの地域のDMCA(デジタルミレニアム著作権法)およびその他の関連法規をご参照ください。<br/>
+    このコードベースはApache 2.0ライセンスの下でリリースされ、すべてのモデルはCC-BY-NC-SA-4.0ライセンスの下でリリースされています。
 
-## 重みのダウンロード
+## システム要件
 
-まず、モデルの重みをダウンロードする必要があります:
+- GPU メモリ:12GB(推論)
+- システム:Linux、Windows
 
-```bash
-huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
-```
-
-## コマンドライン推論
-
-!!! note
-    モデルにランダムに音色を選択させる場合は、この手順をスキップできます。
-
-### 1. 参照音声からVQトークンを取得
-
-```bash
-python fish_speech/models/dac/inference.py \
-    -i "ref_audio_name.wav" \
-    --checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
-```
-
-`fake.npy` と `fake.wav` が得られるはずです。
-
-### 2. テキストからセマンティックトークンを生成:
-
-```bash
-python fish_speech/models/text2semantic/inference.py \
-    --text "変換したいテキスト" \
-    --prompt-text "参照テキスト" \
-    --prompt-tokens "fake.npy" \
-    --checkpoint-path "checkpoints/openaudio-s1-mini" \
-    --num-samples 2 \
-    --compile # より高速化を求める場合
-```
-
-このコマンドは、作業ディレクトリに `codes_N` ファイルを作成します(Nは0から始まる整数)。
-
-!!! note
-    より高速な推論のために `--compile` を使用してCUDAカーネルを融合することができます(約30トークン/秒 -> 約500トークン/秒)。
-    対応して、加速を使用しない場合は、`--compile` パラメータをコメントアウトできます。
-
-!!! info
-    bf16をサポートしないGPUの場合、`--half` パラメータの使用が必要かもしれません。
+## セットアップ
 
-### 3. セマンティックトークンから音声を生成:
-
-#### VQGANデコーダー
-
-!!! warning "将来の警告"
-    元のパス(tools/vqgan/inference.py)からアクセス可能なインターフェースを維持していますが、このインターフェースは後続のリリースで削除される可能性があるため、できるだけ早くコードを変更してください。
-
-```bash
-python fish_speech/models/dac/inference.py \
-    -i "codes_0.npy" \
-    --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
-```
-
-## HTTP API推論
-
-推論用のHTTP APIを提供しています。以下のコマンドでサーバーを開始できます:
+まず、パッケージをインストールするためのconda環境を作成する必要があります。
 
 ```bash
-python -m tools.api_server \
-    --listen 0.0.0.0:8080 \
-    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
-    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
-    --decoder-config-name modded_dac_vq
-```
 
-> 推論を高速化したい場合は、`--compile` パラメータを追加できます。
+conda create -n fish-speech python=3.12
+conda activate fish-speech
 
-その後、http://127.0.0.1:8080/ でAPIを表示・テストできます。
+pip install sudo apt-get install portaudio19-dev # pyaudio用
+pip install -e . # これにより残りのパッケージがすべてダウンロードされます。
 
-## GUI推論 
-[クライアントをダウンロード](https://github.com/AnyaCoder/fish-speech-gui/releases)
-
-## WebUI推論
-
-以下のコマンドでWebUIを開始できます:
-
-```bash
-python -m tools.run_webui \
-    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
-    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
-    --decoder-config-name modded_dac_vq
-```
-
-または単純に
-
-```bash
-python -m tools.run_webui
+apt install libsox-dev ffmpeg # 必要に応じて。
 ```
-> 推論を高速化したい場合は、`--compile` パラメータを追加できます。
 
-!!! note
-    ラベルファイルと参照音声ファイルをメインディレクトリの `references` フォルダに事前に保存することができます(自分で作成する必要があります)。これにより、WebUIで直接呼び出すことができます。
+!!! warning
+    `compile`オプションはWindowsとmacOSでサポートされていません。compileで実行したい場合は、tritionを自分でインストールする必要があります。
 
-!!! note
-    `GRADIO_SHARE`、`GRADIO_SERVER_PORT`、`GRADIO_SERVER_NAME` などのGradio環境変数を使用してWebUIを設定できます。
+## 謝辞
 
-お楽しみください!
+- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
+- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
+- [GPT VITS](https://github.com/innnky/gpt-vits)
+- [MQTTS](https://github.com/b04901014/MQTTS)
+- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
+- [Transformers](https://github.com/huggingface/transformers)
+- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)

+ 107 - 0
docs/ja/inference.md

@@ -0,0 +1,107 @@
+# 推論
+
+ボコーダーモデルが変更されたため、以前よりも多くのVRAMが必要です。スムーズな推論には12GBを推奨します。
+
+推論には、コマンドライン、HTTP API、WebUIをサポートしており、お好きな方法を選択できます。
+
+## 重みのダウンロード
+
+まず、モデルの重みをダウンロードする必要があります:
+
+```bash
+huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
+```
+
+## コマンドライン推論
+
+!!! note
+    モデルにランダムに音色を選択させる場合は、この手順をスキップできます。
+
+### 1. 参照音声からVQトークンを取得
+
+```bash
+python fish_speech/models/dac/inference.py \
+    -i "ref_audio_name.wav" \
+    --checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
+```
+
+`fake.npy` と `fake.wav` が得られるはずです。
+
+### 2. テキストからセマンティックトークンを生成:
+
+```bash
+python fish_speech/models/text2semantic/inference.py \
+    --text "変換したいテキスト" \
+    --prompt-text "参照テキスト" \
+    --prompt-tokens "fake.npy" \
+    --checkpoint-path "checkpoints/openaudio-s1-mini" \
+    --num-samples 2 \
+    --compile # より高速化を求める場合
+```
+
+このコマンドは、作業ディレクトリに `codes_N` ファイルを作成します(Nは0から始まる整数)。
+
+!!! note
+    より高速な推論のために `--compile` を使用してCUDAカーネルを融合することができます(約30トークン/秒 -> 約500トークン/秒)。
+    対応して、加速を使用しない場合は、`--compile` パラメータをコメントアウトできます。
+
+!!! info
+    bf16をサポートしないGPUの場合、`--half` パラメータの使用が必要かもしれません。
+
+### 3. セマンティックトークンから音声を生成:
+
+#### VQGANデコーダー
+
+!!! warning "将来の警告"
+    元のパス(tools/vqgan/inference.py)からアクセス可能なインターフェースを維持していますが、このインターフェースは後続のリリースで削除される可能性があるため、できるだけ早くコードを変更してください。
+
+```bash
+python fish_speech/models/dac/inference.py \
+    -i "codes_0.npy" \
+    --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
+```
+
+## HTTP API推論
+
+推論用のHTTP APIを提供しています。以下のコマンドでサーバーを開始できます:
+
+```bash
+python -m tools.api_server \
+    --listen 0.0.0.0:8080 \
+    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
+    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
+    --decoder-config-name modded_dac_vq
+```
+
+> 推論を高速化したい場合は、`--compile` パラメータを追加できます。
+
+その後、http://127.0.0.1:8080/ でAPIを表示・テストできます。
+
+## GUI推論 
+[クライアントをダウンロード](https://github.com/AnyaCoder/fish-speech-gui/releases)
+
+## WebUI推論
+
+以下のコマンドでWebUIを開始できます:
+
+```bash
+python -m tools.run_webui \
+    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
+    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
+    --decoder-config-name modded_dac_vq
+```
+
+または単純に
+
+```bash
+python -m tools.run_webui
+```
+> 推論を高速化したい場合は、`--compile` パラメータを追加できます。
+
+!!! note
+    ラベルファイルと参照音声ファイルをメインディレクトリの `references` フォルダに事前に保存することができます(自分で作成する必要があります)。これにより、WebUIで直接呼び出すことができます。
+
+!!! note
+    `GRADIO_SHARE`、`GRADIO_SERVER_PORT`、`GRADIO_SERVER_NAME` などのGradio環境変数を使用してWebUIを設定できます。
+
+お楽しみください!

+ 0 - 50
docs/ja/install.md

@@ -1,50 +0,0 @@
-# 紹介
-
-<div>
-<a target="_blank" href="https://discord.gg/Es5qTB9BcN">
-<img alt="Discord" src="https://img.shields.io/discord/1214047546020728892?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"/>
-</a>
-<a target="_blank" href="http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=jCKlUP7QgSm9kh95UlBoYv6s1I-Apl1M&authKey=xI5ttVAp3do68IpEYEalwXSYZFdfxZSkah%2BctF5FIMyN2NqAa003vFtLqJyAVRfF&noverify=0&group_code=593946093">
-<img alt="QQ" src="https://img.shields.io/badge/QQ Group-%2312B7F5?logo=tencent-qq&logoColor=white&style=flat-square"/>
-</a>
-<a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
-<img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
-</a>
-</div>
-
-!!! warning
-    このコードベースの違法な使用について、当方は一切の責任を負いません。お住まいの地域のDMCA(デジタルミレニアム著作権法)およびその他の関連法規をご参照ください。<br/>
-    このコードベースはApache 2.0ライセンスの下でリリースされ、すべてのモデルはCC-BY-NC-SA-4.0ライセンスの下でリリースされています。
-
-## システム要件
-
-- GPU メモリ:12GB(推論)
-- システム:Linux、Windows
-
-## セットアップ
-
-まず、パッケージをインストールするためのconda環境を作成する必要があります。
-
-```bash
-
-conda create -n fish-speech python=3.12
-conda activate fish-speech
-
-pip install sudo apt-get install portaudio19-dev # pyaudio用
-pip install -e . # これにより残りのパッケージがすべてダウンロードされます。
-
-apt install libsox-dev ffmpeg # 必要に応じて。
-```
-
-!!! warning
-    `compile`オプションはWindowsとmacOSでサポートされていません。compileで実行したい場合は、tritionを自分でインストールする必要があります。
-
-## 謝辞
-
-- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
-- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
-- [GPT VITS](https://github.com/innnky/gpt-vits)
-- [MQTTS](https://github.com/b04901014/MQTTS)
-- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
-- [Transformers](https://github.com/huggingface/transformers)
-- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)

+ 35 - 92
docs/ko/index.md

@@ -1,107 +1,50 @@
-# 추론
+# 소개
 
-보코더 모델이 변경되어 이전보다 더 많은 VRAM이 필요하며, 원활한 추론을 위해 12GB를 권장합니다.
+<div>
+<a target="_blank" href="https://discord.gg/Es5qTB9BcN">
+<img alt="Discord" src="https://img.shields.io/discord/1214047546020728892?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"/>
+</a>
+<a target="_blank" href="http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=jCKlUP7QgSm9kh95UlBoYv6s1I-Apl1M&authKey=xI5ttVAp3do68IpEYEalwXSYZFdfxZSkah%2BctF5FIMyN2NqAa003vFtLqJyAVRfF&noverify=0&group_code=593946093">
+<img alt="QQ" src="https://img.shields.io/badge/QQ Group-%2312B7F5?logo=tencent-qq&logoColor=white&style=flat-square"/>
+</a>
+<a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
+<img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
+</a>
+</div>
 
-추론을 위해 명령줄, HTTP API, WebUI를 지원하며, 원하는 방법을 선택할 수 있습니다.
+!!! warning
+    코드베이스의 불법적인 사용에 대해서는 일체 책임을 지지 않습니다. 귀하의 지역의 DMCA(디지털 밀레니엄 저작권법) 및 기타 관련 법률을 참고하시기 바랍니다. <br/>
+    이 코드베이스는 Apache 2.0 라이선스 하에 배포되며, 모든 모델은 CC-BY-NC-SA-4.0 라이선스 하에 배포됩니다.
 
-## 가중치 다운로드
+## 시스템 요구사항
 
-먼저 모델 가중치를 다운로드해야 합니다:
+- GPU 메모리: 12GB (추론)
+- 시스템: Linux, Windows
 
-```bash
-huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
-```
-
-## 명령줄 추론
-
-!!! note
-    모델이 임의로 음색을 선택하도록 하려면 이 단계를 건너뛸 수 있습니다.
-
-### 1. 참조 오디오에서 VQ 토큰 얻기
-
-```bash
-python fish_speech/models/dac/inference.py \
-    -i "ref_audio_name.wav" \
-    --checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
-```
-
-`fake.npy`와 `fake.wav`를 얻을 수 있습니다.
-
-### 2. 텍스트에서 의미 토큰 생성:
-
-```bash
-python fish_speech/models/text2semantic/inference.py \
-    --text "변환하고 싶은 텍스트" \
-    --prompt-text "참조 텍스트" \
-    --prompt-tokens "fake.npy" \
-    --checkpoint-path "checkpoints/openaudio-s1-mini" \
-    --num-samples 2 \
-    --compile # 더 빠른 속도를 원한다면
-```
-
-이 명령은 작업 디렉토리에 `codes_N` 파일을 생성합니다. 여기서 N은 0부터 시작하는 정수입니다.
-
-!!! note
-    더 빠른 추론을 위해 `--compile`을 사용하여 CUDA 커널을 융합할 수 있습니다(약 30 토큰/초 -> 약 500 토큰/초).
-    이에 따라 가속을 사용하지 않으려면 `--compile` 매개변수를 주석 처리할 수 있습니다.
-
-!!! info
-    bf16을 지원하지 않는 GPU의 경우 `--half` 매개변수를 사용해야 할 수 있습니다.
+## 설치
 
-### 3. 의미 토큰에서 음성 생성:
-
-#### VQGAN 디코더
-
-!!! warning "향후 경고"
-    원래 경로(tools/vqgan/inference.py)에서 액세스 가능한 인터페이스를 유지하고 있지만, 이 인터페이스는 향후 릴리스에서 제거될 수 있으므로 가능한 한 빨리 코드를 변경해 주세요.
-
-```bash
-python fish_speech/models/dac/inference.py \
-    -i "codes_0.npy" \
-    --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
-```
-
-## HTTP API 추론
-
-추론을 위한 HTTP API를 제공합니다. 다음 명령으로 서버를 시작할 수 있습니다:
+먼저 패키지를 설치하기 위한 conda 환경을 만들어야 합니다.
 
 ```bash
-python -m tools.api_server \
-    --listen 0.0.0.0:8080 \
-    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
-    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
-    --decoder-config-name modded_dac_vq
-```
 
-> 추론을 가속화하려면 `--compile` 매개변수를 추가할 수 있습니다.
+conda create -n fish-speech python=3.12
+conda activate fish-speech
 
-그 후 http://127.0.0.1:8080/ 에서 API를 보고 테스트할 수 있습니다.
+pip install sudo apt-get install portaudio19-dev # pyaudio용
+pip install -e . # 나머지 모든 패키지를 다운로드합니다.
 
-## GUI 추론 
-[클라이언트 다운로드](https://github.com/AnyaCoder/fish-speech-gui/releases)
-
-## WebUI 추론
-
-다음 명령으로 WebUI를 시작할 수 있습니다:
-
-```bash
-python -m tools.run_webui \
-    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
-    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
-    --decoder-config-name modded_dac_vq
-```
-
-또는 간단히
-
-```bash
-python -m tools.run_webui
+apt install libsox-dev ffmpeg # 필요한 경우.
 ```
-> 추론을 가속화하려면 `--compile` 매개변수를 추가할 수 있습니다.
 
-!!! note
-    라벨 파일과 참조 오디오 파일을 메인 디렉토리의 `references` 폴더에 미리 저장할 수 있습니다(직접 생성해야 함). 이렇게 하면 WebUI에서 직접 호출할 수 있습니다.
+!!! warning
+    `compile` 옵션은 Windows와 macOS에서 지원되지 않습니다. compile로 실행하려면 trition을 직접 설치해야 합니다.
 
-!!! note
-    `GRADIO_SHARE`, `GRADIO_SERVER_PORT`, `GRADIO_SERVER_NAME`과 같은 Gradio 환경 변수를 사용하여 WebUI를 구성할 수 있습니다.
+## 감사의 말
 
-즐기세요!
+- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
+- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
+- [GPT VITS](https://github.com/innnky/gpt-vits)
+- [MQTTS](https://github.com/b04901014/MQTTS)
+- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
+- [Transformers](https://github.com/huggingface/transformers)
+- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)

+ 107 - 0
docs/ko/inference.md

@@ -0,0 +1,107 @@
+# 추론
+
+보코더 모델이 변경되어 이전보다 더 많은 VRAM이 필요하며, 원활한 추론을 위해 12GB를 권장합니다.
+
+추론을 위해 명령줄, HTTP API, WebUI를 지원하며, 원하는 방법을 선택할 수 있습니다.
+
+## 가중치 다운로드
+
+먼저 모델 가중치를 다운로드해야 합니다:
+
+```bash
+huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
+```
+
+## 명령줄 추론
+
+!!! note
+    모델이 임의로 음색을 선택하도록 하려면 이 단계를 건너뛸 수 있습니다.
+
+### 1. 참조 오디오에서 VQ 토큰 얻기
+
+```bash
+python fish_speech/models/dac/inference.py \
+    -i "ref_audio_name.wav" \
+    --checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
+```
+
+`fake.npy`와 `fake.wav`를 얻을 수 있습니다.
+
+### 2. 텍스트에서 의미 토큰 생성:
+
+```bash
+python fish_speech/models/text2semantic/inference.py \
+    --text "변환하고 싶은 텍스트" \
+    --prompt-text "참조 텍스트" \
+    --prompt-tokens "fake.npy" \
+    --checkpoint-path "checkpoints/openaudio-s1-mini" \
+    --num-samples 2 \
+    --compile # 더 빠른 속도를 원한다면
+```
+
+이 명령은 작업 디렉토리에 `codes_N` 파일을 생성합니다. 여기서 N은 0부터 시작하는 정수입니다.
+
+!!! note
+    더 빠른 추론을 위해 `--compile`을 사용하여 CUDA 커널을 융합할 수 있습니다(약 30 토큰/초 -> 약 500 토큰/초).
+    이에 따라 가속을 사용하지 않으려면 `--compile` 매개변수를 주석 처리할 수 있습니다.
+
+!!! info
+    bf16을 지원하지 않는 GPU의 경우 `--half` 매개변수를 사용해야 할 수 있습니다.
+
+### 3. 의미 토큰에서 음성 생성:
+
+#### VQGAN 디코더
+
+!!! warning "향후 경고"
+    원래 경로(tools/vqgan/inference.py)에서 액세스 가능한 인터페이스를 유지하고 있지만, 이 인터페이스는 향후 릴리스에서 제거될 수 있으므로 가능한 한 빨리 코드를 변경해 주세요.
+
+```bash
+python fish_speech/models/dac/inference.py \
+    -i "codes_0.npy" \
+    --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
+```
+
+## HTTP API 추론
+
+추론을 위한 HTTP API를 제공합니다. 다음 명령으로 서버를 시작할 수 있습니다:
+
+```bash
+python -m tools.api_server \
+    --listen 0.0.0.0:8080 \
+    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
+    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
+    --decoder-config-name modded_dac_vq
+```
+
+> 추론을 가속화하려면 `--compile` 매개변수를 추가할 수 있습니다.
+
+그 후 http://127.0.0.1:8080/ 에서 API를 보고 테스트할 수 있습니다.
+
+## GUI 추론 
+[클라이언트 다운로드](https://github.com/AnyaCoder/fish-speech-gui/releases)
+
+## WebUI 추론
+
+다음 명령으로 WebUI를 시작할 수 있습니다:
+
+```bash
+python -m tools.run_webui \
+    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
+    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
+    --decoder-config-name modded_dac_vq
+```
+
+또는 간단히
+
+```bash
+python -m tools.run_webui
+```
+> 추론을 가속화하려면 `--compile` 매개변수를 추가할 수 있습니다.
+
+!!! note
+    라벨 파일과 참조 오디오 파일을 메인 디렉토리의 `references` 폴더에 미리 저장할 수 있습니다(직접 생성해야 함). 이렇게 하면 WebUI에서 직접 호출할 수 있습니다.
+
+!!! note
+    `GRADIO_SHARE`, `GRADIO_SERVER_PORT`, `GRADIO_SERVER_NAME`과 같은 Gradio 환경 변수를 사용하여 WebUI를 구성할 수 있습니다.
+
+즐기세요!

+ 0 - 50
docs/ko/install.md

@@ -1,50 +0,0 @@
-# 소개
-
-<div>
-<a target="_blank" href="https://discord.gg/Es5qTB9BcN">
-<img alt="Discord" src="https://img.shields.io/discord/1214047546020728892?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"/>
-</a>
-<a target="_blank" href="http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=jCKlUP7QgSm9kh95UlBoYv6s1I-Apl1M&authKey=xI5ttVAp3do68IpEYEalwXSYZFdfxZSkah%2BctF5FIMyN2NqAa003vFtLqJyAVRfF&noverify=0&group_code=593946093">
-<img alt="QQ" src="https://img.shields.io/badge/QQ Group-%2312B7F5?logo=tencent-qq&logoColor=white&style=flat-square"/>
-</a>
-<a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
-<img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
-</a>
-</div>
-
-!!! warning
-    코드베이스의 불법적인 사용에 대해서는 일체 책임을 지지 않습니다. 귀하의 지역의 DMCA(디지털 밀레니엄 저작권법) 및 기타 관련 법률을 참고하시기 바랍니다. <br/>
-    이 코드베이스는 Apache 2.0 라이선스 하에 배포되며, 모든 모델은 CC-BY-NC-SA-4.0 라이선스 하에 배포됩니다.
-
-## 시스템 요구사항
-
-- GPU 메모리: 12GB (추론)
-- 시스템: Linux, Windows
-
-## 설치
-
-먼저 패키지를 설치하기 위한 conda 환경을 만들어야 합니다.
-
-```bash
-
-conda create -n fish-speech python=3.12
-conda activate fish-speech
-
-pip install sudo apt-get install portaudio19-dev # pyaudio용
-pip install -e . # 나머지 모든 패키지를 다운로드합니다.
-
-apt install libsox-dev ffmpeg # 필요한 경우.
-```
-
-!!! warning
-    `compile` 옵션은 Windows와 macOS에서 지원되지 않습니다. compile로 실행하려면 trition을 직접 설치해야 합니다.
-
-## 감사의 말
-
-- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
-- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
-- [GPT VITS](https://github.com/innnky/gpt-vits)
-- [MQTTS](https://github.com/b04901014/MQTTS)
-- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
-- [Transformers](https://github.com/huggingface/transformers)
-- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)

+ 35 - 92
docs/pt/index.md

@@ -1,107 +1,50 @@
-# Inferência
+# Introdução
 
-Como o modelo vocoder foi alterado, você precisa de mais VRAM do que antes, sendo recomendado 12GB para inferência fluente.
+<div>
+<a target="_blank" href="https://discord.gg/Es5qTB9BcN">
+<img alt="Discord" src="https://img.shields.io/discord/1214047546020728892?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"/>
+</a>
+<a target="_blank" href="http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=jCKlUP7QgSm9kh95UlBoYv6s1I-Apl1M&authKey=xI5ttVAp3do68IpEYEalwXSYZFdfxZSkah%2BctF5FIMyN2NqAa003vFtLqJyAVRfF&noverify=0&group_code=593946093">
+<img alt="QQ" src="https://img.shields.io/badge/QQ Group-%2312B7F5?logo=tencent-qq&logoColor=white&style=flat-square"/>
+</a>
+<a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
+<img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
+</a>
+</div>
 
-Suportamos linha de comando, API HTTP e WebUI para inferência, você pode escolher qualquer método que preferir.
+!!! warning
+    Não assumimos nenhuma responsabilidade pelo uso ilegal da base de código. Consulte as leis locais sobre DMCA (Digital Millennium Copyright Act) e outras leis relevantes em sua área. <br/>
+    Esta base de código é lançada sob a licença Apache 2.0 e todos os modelos são lançados sob a licença CC-BY-NC-SA-4.0.
 
-## Baixar Pesos
+## Requisitos
 
-Primeiro você precisa baixar os pesos do modelo:
+- Memória GPU: 12GB (Inferência)
+- Sistema: Linux, Windows
 
-```bash
-huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
-```
-
-## Inferência por Linha de Comando
-
-!!! note
-    Se você planeja deixar o modelo escolher aleatoriamente um timbre de voz, pode pular esta etapa.
-
-### 1. Obter tokens VQ do áudio de referência
-
-```bash
-python fish_speech/models/dac/inference.py \
-    -i "ref_audio_name.wav" \
-    --checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
-```
-
-Você deve obter um `fake.npy` e um `fake.wav`.
-
-### 2. Gerar tokens semânticos do texto:
-
-```bash
-python fish_speech/models/text2semantic/inference.py \
-    --text "O texto que você quer converter" \
-    --prompt-text "Seu texto de referência" \
-    --prompt-tokens "fake.npy" \
-    --checkpoint-path "checkpoints/openaudio-s1-mini" \
-    --num-samples 2 \
-    --compile # se você quiser uma velocidade mais rápida
-```
-
-Este comando criará um arquivo `codes_N` no diretório de trabalho, onde N é um inteiro começando de 0.
-
-!!! note
-    Você pode querer usar `--compile` para fundir kernels CUDA para inferência mais rápida (~30 tokens/segundo -> ~500 tokens/segundo).
-    Correspondentemente, se você não planeja usar aceleração, pode comentar o parâmetro `--compile`.
-
-!!! info
-    Para GPUs que não suportam bf16, você pode precisar usar o parâmetro `--half`.
+## Configuração
 
-### 3. Gerar vocais a partir de tokens semânticos:
-
-#### Decodificador VQGAN
-
-!!! warning "Aviso Futuro"
-    Mantivemos a interface acessível do caminho original (tools/vqgan/inference.py), mas esta interface pode ser removida em versões subsequentes, então por favor altere seu código o mais breve possível.
-
-```bash
-python fish_speech/models/dac/inference.py \
-    -i "codes_0.npy" \
-    --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
-```
-
-## Inferência com API HTTP
-
-Fornecemos uma API HTTP para inferência. Você pode usar o seguinte comando para iniciar o servidor:
+Primeiro, precisamos criar um ambiente conda para instalar os pacotes.
 
 ```bash
-python -m tools.api_server \
-    --listen 0.0.0.0:8080 \
-    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
-    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
-    --decoder-config-name modded_dac_vq
-```
 
-> Se você quiser acelerar a inferência, pode adicionar o parâmetro `--compile`.
+conda create -n fish-speech python=3.12
+conda activate fish-speech
 
-Depois disso, você pode visualizar e testar a API em http://127.0.0.1:8080/.
+pip install sudo apt-get install portaudio19-dev # Para pyaudio
+pip install -e . # Isso baixará todos os pacotes restantes.
 
-## Inferência GUI 
-[Baixar cliente](https://github.com/AnyaCoder/fish-speech-gui/releases)
-
-## Inferência WebUI
-
-Você pode iniciar o WebUI usando o seguinte comando:
-
-```bash
-python -m tools.run_webui \
-    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
-    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
-    --decoder-config-name modded_dac_vq
-```
-
-Ou simplesmente
-
-```bash
-python -m tools.run_webui
+apt install libsox-dev ffmpeg # Se necessário.
 ```
-> Se você quiser acelerar a inferência, pode adicionar o parâmetro `--compile`.
 
-!!! note
-    Você pode salvar o arquivo de rótulo e o arquivo de áudio de referência antecipadamente na pasta `references` no diretório principal (que você precisa criar), para que possa chamá-los diretamente no WebUI.
+!!! warning
+    A opção `compile` não é suportada no Windows e macOS, se você quiser executar com compile, precisa instalar o trition por conta própria.
 
-!!! note
-    Você pode usar variáveis de ambiente do Gradio, como `GRADIO_SHARE`, `GRADIO_SERVER_PORT`, `GRADIO_SERVER_NAME` para configurar o WebUI.
+## Agradecimentos
 
-Divirta-se!
+- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
+- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
+- [GPT VITS](https://github.com/innnky/gpt-vits)
+- [MQTTS](https://github.com/b04901014/MQTTS)
+- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
+- [Transformers](https://github.com/huggingface/transformers)
+- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)

+ 107 - 0
docs/pt/inference.md

@@ -0,0 +1,107 @@
+# Inferência
+
+Como o modelo vocoder foi alterado, você precisa de mais VRAM do que antes, sendo recomendado 12GB para inferência fluente.
+
+Suportamos linha de comando, API HTTP e WebUI para inferência, você pode escolher qualquer método que preferir.
+
+## Baixar Pesos
+
+Primeiro você precisa baixar os pesos do modelo:
+
+```bash
+huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
+```
+
+## Inferência por Linha de Comando
+
+!!! note
+    Se você planeja deixar o modelo escolher aleatoriamente um timbre de voz, pode pular esta etapa.
+
+### 1. Obter tokens VQ do áudio de referência
+
+```bash
+python fish_speech/models/dac/inference.py \
+    -i "ref_audio_name.wav" \
+    --checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
+```
+
+Você deve obter um `fake.npy` e um `fake.wav`.
+
+### 2. Gerar tokens semânticos do texto:
+
+```bash
+python fish_speech/models/text2semantic/inference.py \
+    --text "O texto que você quer converter" \
+    --prompt-text "Seu texto de referência" \
+    --prompt-tokens "fake.npy" \
+    --checkpoint-path "checkpoints/openaudio-s1-mini" \
+    --num-samples 2 \
+    --compile # se você quiser uma velocidade mais rápida
+```
+
+Este comando criará um arquivo `codes_N` no diretório de trabalho, onde N é um inteiro começando de 0.
+
+!!! note
+    Você pode querer usar `--compile` para fundir kernels CUDA para inferência mais rápida (~30 tokens/segundo -> ~500 tokens/segundo).
+    Correspondentemente, se você não planeja usar aceleração, pode comentar o parâmetro `--compile`.
+
+!!! info
+    Para GPUs que não suportam bf16, você pode precisar usar o parâmetro `--half`.
+
+### 3. Gerar vocais a partir de tokens semânticos:
+
+#### Decodificador VQGAN
+
+!!! warning "Aviso Futuro"
+    Mantivemos a interface acessível do caminho original (tools/vqgan/inference.py), mas esta interface pode ser removida em versões subsequentes, então por favor altere seu código o mais breve possível.
+
+```bash
+python fish_speech/models/dac/inference.py \
+    -i "codes_0.npy" \
+    --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
+```
+
+## Inferência com API HTTP
+
+Fornecemos uma API HTTP para inferência. Você pode usar o seguinte comando para iniciar o servidor:
+
+```bash
+python -m tools.api_server \
+    --listen 0.0.0.0:8080 \
+    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
+    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
+    --decoder-config-name modded_dac_vq
+```
+
+> Se você quiser acelerar a inferência, pode adicionar o parâmetro `--compile`.
+
+Depois disso, você pode visualizar e testar a API em http://127.0.0.1:8080/.
+
+## Inferência GUI 
+[Baixar cliente](https://github.com/AnyaCoder/fish-speech-gui/releases)
+
+## Inferência WebUI
+
+Você pode iniciar o WebUI usando o seguinte comando:
+
+```bash
+python -m tools.run_webui \
+    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
+    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
+    --decoder-config-name modded_dac_vq
+```
+
+Ou simplesmente
+
+```bash
+python -m tools.run_webui
+```
+> Se você quiser acelerar a inferência, pode adicionar o parâmetro `--compile`.
+
+!!! note
+    Você pode salvar o arquivo de rótulo e o arquivo de áudio de referência antecipadamente na pasta `references` no diretório principal (que você precisa criar), para que possa chamá-los diretamente no WebUI.
+
+!!! note
+    Você pode usar variáveis de ambiente do Gradio, como `GRADIO_SHARE`, `GRADIO_SERVER_PORT`, `GRADIO_SERVER_NAME` para configurar o WebUI.
+
+Divirta-se!

+ 0 - 50
docs/pt/install.md

@@ -1,50 +0,0 @@
-# Introdução
-
-<div>
-<a target="_blank" href="https://discord.gg/Es5qTB9BcN">
-<img alt="Discord" src="https://img.shields.io/discord/1214047546020728892?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"/>
-</a>
-<a target="_blank" href="http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=jCKlUP7QgSm9kh95UlBoYv6s1I-Apl1M&authKey=xI5ttVAp3do68IpEYEalwXSYZFdfxZSkah%2BctF5FIMyN2NqAa003vFtLqJyAVRfF&noverify=0&group_code=593946093">
-<img alt="QQ" src="https://img.shields.io/badge/QQ Group-%2312B7F5?logo=tencent-qq&logoColor=white&style=flat-square"/>
-</a>
-<a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
-<img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
-</a>
-</div>
-
-!!! warning
-    Não assumimos nenhuma responsabilidade pelo uso ilegal da base de código. Consulte as leis locais sobre DMCA (Digital Millennium Copyright Act) e outras leis relevantes em sua área. <br/>
-    Esta base de código é lançada sob a licença Apache 2.0 e todos os modelos são lançados sob a licença CC-BY-NC-SA-4.0.
-
-## Requisitos
-
-- Memória GPU: 12GB (Inferência)
-- Sistema: Linux, Windows
-
-## Configuração
-
-Primeiro, precisamos criar um ambiente conda para instalar os pacotes.
-
-```bash
-
-conda create -n fish-speech python=3.12
-conda activate fish-speech
-
-pip install sudo apt-get install portaudio19-dev # Para pyaudio
-pip install -e . # Isso baixará todos os pacotes restantes.
-
-apt install libsox-dev ffmpeg # Se necessário.
-```
-
-!!! warning
-    A opção `compile` não é suportada no Windows e macOS, se você quiser executar com compile, precisa instalar o trition por conta própria.
-
-## Agradecimentos
-
-- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
-- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
-- [GPT VITS](https://github.com/innnky/gpt-vits)
-- [MQTTS](https://github.com/b04901014/MQTTS)
-- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
-- [Transformers](https://github.com/huggingface/transformers)
-- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)

+ 35 - 92
docs/zh/index.md

@@ -1,107 +1,50 @@
-# 推理
+# 简介
 
-由于声码器模型已更改,您需要比以前更多的显存,建议使用12GB显存以便流畅推理。
+<div>
+<a target="_blank" href="https://discord.gg/Es5qTB9BcN">
+<img alt="Discord" src="https://img.shields.io/discord/1214047546020728892?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"/>
+</a>
+<a target="_blank" href="http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=jCKlUP7QgSm9kh95UlBoYv6s1I-Apl1M&authKey=xI5ttVAp3do68IpEYEalwXSYZFdfxZSkah%2BctF5FIMyN2NqAa003vFtLqJyAVRfF&noverify=0&group_code=593946093">
+<img alt="QQ" src="https://img.shields.io/badge/QQ Group-%2312B7F5?logo=tencent-qq&logoColor=white&style=flat-square"/>
+</a>
+<a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
+<img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
+</a>
+</div>
 
-我们支持命令行、HTTP API 和 WebUI 进行推理,您可以选择任何您喜欢的方法。
+!!! warning
+    我们不对代码库的任何非法使用承担责任。请参考您所在地区有关 DMCA(数字千年版权法)和其他相关法律的规定。<br/>
+    此代码库在 Apache 2.0 许可证下发布,所有模型在 CC-BY-NC-SA-4.0 许可证下发布。
 
-## 下载权重
+## 系统要求
 
-首先您需要下载模型权重:
+- GPU 内存:12GB(推理)
+- 系统:Linux、Windows
 
-```bash
-huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
-```
-
-## 命令行推理
-
-!!! note
-    如果您计划让模型随机选择音色,可以跳过此步骤。
-
-### 1. 从参考音频获取VQ tokens
-
-```bash
-python fish_speech/models/dac/inference.py \
-    -i "ref_audio_name.wav" \
-    --checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
-```
-
-您应该会得到一个 `fake.npy` 和一个 `fake.wav`。
-
-### 2. 从文本生成语义tokens:
-
-```bash
-python fish_speech/models/text2semantic/inference.py \
-    --text "您想要转换的文本" \
-    --prompt-text "您的参考文本" \
-    --prompt-tokens "fake.npy" \
-    --checkpoint-path "checkpoints/openaudio-s1-mini" \
-    --num-samples 2 \
-    --compile # 如果您想要更快的速度
-```
-
-此命令将在工作目录中创建一个 `codes_N` 文件,其中N是从0开始的整数。
-
-!!! note
-    您可能想要使用 `--compile` 来融合CUDA内核以获得更快的推理速度(约30 tokens/秒 -> 约500 tokens/秒)。
-    相应地,如果您不打算使用加速,可以删除 `--compile` 参数的注释。
-
-!!! info
-    对于不支持bf16的GPU,您可能需要使用 `--half` 参数。
+## 安装
 
-### 3. 从语义tokens生成人声:
-
-#### VQGAN 解码器
-
-!!! warning "未来警告"
-    我们保留了从原始路径(tools/vqgan/inference.py)访问的接口,但此接口可能在后续版本中被移除,请尽快更改您的代码。
-
-```bash
-python fish_speech/models/dac/inference.py \
-    -i "codes_0.npy" \
-    --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
-```
-
-## HTTP API 推理
-
-我们提供HTTP API进行推理。您可以使用以下命令启动服务器:
+首先,我们需要创建一个 conda 环境来安装包。
 
 ```bash
-python -m tools.api_server \
-    --listen 0.0.0.0:8080 \
-    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
-    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
-    --decoder-config-name modded_dac_vq
-```
 
-> 如果您想要加速推理,可以添加 `--compile` 参数。
+conda create -n fish-speech python=3.12
+conda activate fish-speech
 
-之后,您可以在 http://127.0.0.1:8080/ 查看和测试API。
+pip install sudo apt-get install portaudio19-dev # 用于 pyaudio
+pip install -e . # 这将下载所有其余的包。
 
-## GUI 推理 
-[下载客户端](https://github.com/AnyaCoder/fish-speech-gui/releases)
-
-## WebUI 推理
-
-您可以使用以下命令启动WebUI:
-
-```bash
-python -m tools.run_webui \
-    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
-    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
-    --decoder-config-name modded_dac_vq
-```
-
-或者简单地
-
-```bash
-python -m tools.run_webui
+apt install libsox-dev ffmpeg # 如果需要的话。
 ```
-> 如果您想要加速推理,可以添加 `--compile` 参数。
 
-!!! note
-    您可以提前将标签文件和参考音频文件保存到主目录的 `references` 文件夹中(需要自己创建),这样就可以在WebUI中直接调用它们
+!!! warning
+    `compile` 选项在 Windows 和 macOS 上不受支持,如果您想使用 compile 运行,需要自己安装 trition。
 
-!!! note
-    您可以使用Gradio环境变量,如 `GRADIO_SHARE`、`GRADIO_SERVER_PORT`、`GRADIO_SERVER_NAME` 来配置WebUI。
+## 致谢
 
-尽情享受吧!
+- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
+- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
+- [GPT VITS](https://github.com/innnky/gpt-vits)
+- [MQTTS](https://github.com/b04901014/MQTTS)
+- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
+- [Transformers](https://github.com/huggingface/transformers)
+- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)

+ 107 - 0
docs/zh/inference.md

@@ -0,0 +1,107 @@
+# 推理
+
+由于声码器模型已更改,您需要比以前更多的显存,建议使用12GB显存以便流畅推理。
+
+我们支持命令行、HTTP API 和 WebUI 进行推理,您可以选择任何您喜欢的方法。
+
+## 下载权重
+
+首先您需要下载模型权重:
+
+```bash
+huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
+```
+
+## 命令行推理
+
+!!! note
+    如果您计划让模型随机选择音色,可以跳过此步骤。
+
+### 1. 从参考音频获取VQ tokens
+
+```bash
+python fish_speech/models/dac/inference.py \
+    -i "ref_audio_name.wav" \
+    --checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth"
+```
+
+您应该会得到一个 `fake.npy` 和一个 `fake.wav`。
+
+### 2. 从文本生成语义tokens:
+
+```bash
+python fish_speech/models/text2semantic/inference.py \
+    --text "您想要转换的文本" \
+    --prompt-text "您的参考文本" \
+    --prompt-tokens "fake.npy" \
+    --checkpoint-path "checkpoints/openaudio-s1-mini" \
+    --num-samples 2 \
+    --compile # 如果您想要更快的速度
+```
+
+此命令将在工作目录中创建一个 `codes_N` 文件,其中N是从0开始的整数。
+
+!!! note
+    您可能想要使用 `--compile` 来融合CUDA内核以获得更快的推理速度(约30 tokens/秒 -> 约500 tokens/秒)。
+    相应地,如果您不打算使用加速,可以删除 `--compile` 参数的注释。
+
+!!! info
+    对于不支持bf16的GPU,您可能需要使用 `--half` 参数。
+
+### 3. 从语义tokens生成人声:
+
+#### VQGAN 解码器
+
+!!! warning "未来警告"
+    我们保留了从原始路径(tools/vqgan/inference.py)访问的接口,但此接口可能在后续版本中被移除,请尽快更改您的代码。
+
+```bash
+python fish_speech/models/dac/inference.py \
+    -i "codes_0.npy" \
+    --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
+```
+
+## HTTP API 推理
+
+我们提供HTTP API进行推理。您可以使用以下命令启动服务器:
+
+```bash
+python -m tools.api_server \
+    --listen 0.0.0.0:8080 \
+    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
+    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
+    --decoder-config-name modded_dac_vq
+```
+
+> 如果您想要加速推理,可以添加 `--compile` 参数。
+
+之后,您可以在 http://127.0.0.1:8080/ 查看和测试API。
+
+## GUI 推理 
+[下载客户端](https://github.com/AnyaCoder/fish-speech-gui/releases)
+
+## WebUI 推理
+
+您可以使用以下命令启动WebUI:
+
+```bash
+python -m tools.run_webui \
+    --llama-checkpoint-path "checkpoints/openaudio-s1-mini" \
+    --decoder-checkpoint-path "checkpoints/openaudio-s1-mini/codec.pth" \
+    --decoder-config-name modded_dac_vq
+```
+
+或者简单地
+
+```bash
+python -m tools.run_webui
+```
+> 如果您想要加速推理,可以添加 `--compile` 参数。
+
+!!! note
+    您可以提前将标签文件和参考音频文件保存到主目录的 `references` 文件夹中(需要自己创建),这样就可以在WebUI中直接调用它们。
+
+!!! note
+    您可以使用Gradio环境变量,如 `GRADIO_SHARE`、`GRADIO_SERVER_PORT`、`GRADIO_SERVER_NAME` 来配置WebUI。
+
+尽情享受吧!

+ 0 - 50
docs/zh/install.md

@@ -1,50 +0,0 @@
-# 简介
-
-<div>
-<a target="_blank" href="https://discord.gg/Es5qTB9BcN">
-<img alt="Discord" src="https://img.shields.io/discord/1214047546020728892?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"/>
-</a>
-<a target="_blank" href="http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=jCKlUP7QgSm9kh95UlBoYv6s1I-Apl1M&authKey=xI5ttVAp3do68IpEYEalwXSYZFdfxZSkah%2BctF5FIMyN2NqAa003vFtLqJyAVRfF&noverify=0&group_code=593946093">
-<img alt="QQ" src="https://img.shields.io/badge/QQ Group-%2312B7F5?logo=tencent-qq&logoColor=white&style=flat-square"/>
-</a>
-<a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
-<img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
-</a>
-</div>
-
-!!! warning
-    我们不对代码库的任何非法使用承担责任。请参考您所在地区有关 DMCA(数字千年版权法)和其他相关法律的规定。<br/>
-    此代码库在 Apache 2.0 许可证下发布,所有模型在 CC-BY-NC-SA-4.0 许可证下发布。
-
-## 系统要求
-
-- GPU 内存:12GB(推理)
-- 系统:Linux、Windows
-
-## 安装
-
-首先,我们需要创建一个 conda 环境来安装包。
-
-```bash
-
-conda create -n fish-speech python=3.12
-conda activate fish-speech
-
-pip install sudo apt-get install portaudio19-dev # 用于 pyaudio
-pip install -e . # 这将下载所有其余的包。
-
-apt install libsox-dev ffmpeg # 如果需要的话。
-```
-
-!!! warning
-    `compile` 选项在 Windows 和 macOS 上不受支持,如果您想使用 compile 运行,需要自己安装 trition。
-
-## 致谢
-
-- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
-- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
-- [GPT VITS](https://github.com/innnky/gpt-vits)
-- [MQTTS](https://github.com/b04901014/MQTTS)
-- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
-- [Transformers](https://github.com/huggingface/transformers)
-- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)

+ 6 - 6
inference.ipynb

@@ -61,7 +61,7 @@
     "# !set HF_ENDPOINT=https://hf-mirror.com\n",
     "# !export HF_ENDPOINT=https://hf-mirror.com \n",
     "\n",
-    "!huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/openaudio-s1-mini/"
+    "!huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini/"
    ]
   },
   {
@@ -85,7 +85,7 @@
    "source": [
     "!python tools/run_webui.py \\\n",
     "    --llama-checkpoint-path checkpoints/openaudio-s1-mini \\\n",
-    "    --decoder-checkpoint-path checkpoints/openaudio-s1-mini/firefly-gan-vq-fsq-8x1024-21hz-generator.pth \\\n",
+    "    --decoder-checkpoint-path checkpoints/openaudio-s1-mini/codec.pth \\\n",
     "    # --compile"
    ]
   },
@@ -120,9 +120,9 @@
     "## Enter the path to the audio file here\n",
     "src_audio = r\"D:\\PythonProject\\vo_hutao_draw_appear.wav\"\n",
     "\n",
-    "!python fish_speech/models/vqgan/inference.py \\\n",
+    "!python fish_speech/models/dac/inference.py \\\n",
     "    -i {src_audio} \\\n",
-    "    --checkpoint-path \"checkpoints/openaudio-s1-mini/firefly-gan-vq-fsq-8x1024-21hz-generator.pth\"\n",
+    "    --checkpoint-path \"checkpoints/openaudio-s1-mini/codec.pth\"\n",
     "\n",
     "from IPython.display import Audio, display\n",
     "audio = Audio(filename=\"fake.wav\")\n",
@@ -180,9 +180,9 @@
    },
    "outputs": [],
    "source": [
-    "!python fish_speech/models/vqgan/inference.py \\\n",
+    "!python fish_speech/models/dac/inference.py \\\n",
     "    -i \"codes_0.npy\" \\\n",
-    "    --checkpoint-path \"checkpoints/openaudio-s1-mini/firefly-gan-vq-fsq-8x1024-21hz-generator.pth\"\n",
+    "    --checkpoint-path \"checkpoints/openaudio-s1-mini/codec.pth\"\n",
     "\n",
     "from IPython.display import Audio, display\n",
     "audio = Audio(filename=\"fake.wav\")\n",

+ 5 - 5
mkdocs.yml

@@ -56,7 +56,7 @@ theme:
         code: Roboto Mono
 
 nav:
-  - Installation: en/install.md
+  - Installation: en/index.md
   - Inference: en/inference.md
 
 # Plugins
@@ -80,25 +80,25 @@ plugins:
           name: 简体中文
           build: true
           nav:
-            - 安装: zh/install.md
+            - 安装: zh/index.md
             - 推理: zh/inference.md
         - locale: ja
           name: 日本語
           build: true
           nav:
-            - インストール: ja/install.md
+            - インストール: ja/index.md
             - 推論: ja/inference.md
         - locale: pt
           name: Português (Brasil)
           build: true
           nav:
-            - Instalação: pt/install.md
+            - Instalação: pt/index.md
             - Inferência: pt/inference.md
         - locale: ko
           name: 한국어
           build: true
           nav:
-            - 설치: ko/install.md
+            - 설치: ko/index.md
             - 추론: ko/inference.md
 
 markdown_extensions:

+ 1 - 1
pyproject.toml

@@ -50,7 +50,7 @@ dependencies = [
 
 [project.optional-dependencies]
 stable = [
-    "torch<=2.4.1",
+    "torch>=2.5.1",
     "torchaudio",
 ]