před 1 rokem · 04b6c1029a
--- a/docs/en/index.md
+++ b/docs/en/index.md
@@ -13,9 +13,8 @@
 
				 </div>
			
 
				 
			
 
				 !!! warning
			
 
				-We assume no responsibility for any illegal use of the codebase. Please refer to the local laws regarding DMCA (Digital Millennium Copyright Act) and other relevant laws in your area.
			
 
				-
			
 
				-This codebase is released under the `BSD-3-Clause` license, and all models are released under the CC-BY-NC-SA-4.0 license.
			
 
				+    We assume no responsibility for any illegal use of the codebase. Please refer to the local laws regarding DMCA (Digital Millennium Copyright Act) and other relevant laws in your area. <br/>
			
 
				+    This codebase is released under the `BSD-3-Clause` license, and all models are released under the CC-BY-NC-SA-4.0 license.
			
 
				 
			
 
				 <p align="center">
			
 
				    <img src="../assets/figs/diagram.png" width="75%">
			
--- a/docs/en/inference.md
+++ b/docs/en/inference.md
@@ -54,7 +54,7 @@ This command will create a `codes_N` file in the working directory, where N is a
 
				 
			
 
				 ### 3. Generate vocals from semantic tokens:
			
 
				 
			
 
				-#### VQGAN Decoder (not recommended)
			
 
				+#### VQGAN Decoder
			
 
				 
			
 
				 ```bash
			
 
				 python tools/vqgan/inference.py \
			
--- a/docs/ja/index.md
+++ b/docs/ja/index.md
@@ -13,9 +13,8 @@
 
				 </div>
			
 
				 
			
 
				 !!! warning
			
 
				-私たちは、コードベースの違法な使用について一切の責任を負いません。お住まいの地域の DMCA（デジタルミレニアム著作権法）およびその他の関連法については、現地の法律を参照してください。
			
 
				-
			
 
				-このコードベースは `BSD-3-Clause` ライセンスの下でリリースされており、すべてのモデルは CC-BY-NC-SA-4.0 ライセンスの下でリリースされています。
			
 
				+    私たちは、コードベースの違法な使用について一切の責任を負いません。お住まいの地域の DMCA（デジタルミレニアム著作権法）およびその他の関連法については、現地の法律を参照してください。 <br/>
			
 
				+    このコードベースは `BSD-3-Clause` ライセンスの下でリリースされており、すべてのモデルは CC-BY-NC-SA-4.0 ライセンスの下でリリースされています。
			
 
				 
			
 
				 <p align="center">
			
 
				    <img src="../assets/figs/diagram.png" width="75%">
			
--- a/docs/ja/inference.md
+++ b/docs/ja/inference.md
@@ -50,15 +50,11 @@ python tools/llama/generate.py \
 
				     それに対応して、加速を使用しない場合は、`--compile`パラメータをコメントアウトできます。
			
 
				 
			
 
				 !!! info
			
 
				-<<<<<<< HEAD
			
 
				-    bf16をサポートしていないGPUの場合、`--half`パラメータを使用する必要があるかもしれません。
			
 
				-=======
			
 
				     bf16 をサポートしていない GPU の場合、`--half`パラメータを使用する必要があるかもしれません。
			
 
				->>>>>>> upstream/main
			
 
				 
			
 
				 ### 3. セマンティックトークンから音声を生成する：
			
 
				 
			
 
				-#### VQGAN デコーダー（推奨されません）
			
 
				+#### VQGAN デコーダー
			
 
				 
			
 
				 ```bash
			
 
				 python tools/vqgan/inference.py \
			
--- a/docs/zh/index.md
+++ b/docs/zh/index.md
@@ -13,9 +13,8 @@
 
				 </div>
			
 
				 
			
 
				 !!! warning
			
 
				-我们不对代码库的任何非法使用承担任何责任. 请参阅您当地关于 DMCA (数字千年法案) 和其他相关法律法规.
			
 
				-
			
 
				-此代码库根据 `BSD-3-Clause` 许可证发布, 所有模型根据 CC-BY-NC-SA-4.0 许可证发布.
			
 
				+    我们不对代码库的任何非法使用承担任何责任. 请参阅您当地关于 DMCA (数字千年法案) 和其他相关法律法规. <br/>
			
 
				+    此代码库根据 `BSD-3-Clause` 许可证发布, 所有模型根据 CC-BY-NC-SA-4.0 许可证发布.
			
 
				 
			
 
				 <p align="center">
			
 
				    <img src="../assets/figs/diagram.png" width="75%">
			
@@ -51,37 +50,27 @@ Windows 非专业用户可考虑以下为免 Linux 环境的基础运行方法
 
				         - [Visual Studio 下载](https://visualstudio.microsoft.com/zh-hans/downloads/)
			
 
				         - 安装好Visual Studio Installer之后，下载Visual Studio Community 2022
			
 
				         - 如下图点击`修改`按钮，找到`使用C++的桌面开发`项，勾选下载
			
 
				-<p align="center">
			
 
				-   <img src="https://s2.loli.net/2024/07/15/pWdlYXNAMIzb8Lq.png" width="60%">
			
 
				-</p>
			
 
				-4. 双击 `start.bat`，进入 Fish-Speech 训练推理配置 WebUI 页面。
			
 
				-    - (可选) 想直接进入推理页面？编辑项目根目录下的
			
 
				-    -  进入网页后：
			
 
				-
			
 
				-<p align="center">
			
 
				-  <img src="https://s2.loli.net/2024/05/06/gw2L39Qj4mClJSG.png" width="75%">
			
 
				-</p>
			
 
				-
			
 
				-   -  简单说一下各部分区域构成，如下图所示，方便按图索骥：
			
 
				-
			
 
				-<p align="center">
			
 
				-  <img src="https://s2.loli.net/2024/05/06/NvfsgyRZCSk72MG.png" width="75%">
			
 
				-</p>
			
 
				+    4. 下载安装 [CUDA Toolkit 12](https://developer.nvidia.com/cuda-12-1-0-download-archive?target_os=Windows&target_arch=x86_64)
			
 
				+4. 双击 `start.bat` 打开训练推理WebUI管理界面. 如有需要，可照下列提示修改`API_FLAGS`.
			
 
				+   
			
 
				+!!! info "可选"
			
 
				 
			
 
				-   -  **1** banner（横幅）：进入网页后从左到右逐渐显示"Welcome to Fish-Speech"字样。以后可能变动。
			
 
				-   -  **2** 功能区: 在这里，你将决定数据集文件的来源，文本标签的修改，训练参数的调整、推理页面的设置。
			
 
				-   -  **3** 文件信息展示区：一般不可更改。指引你如何找到自己的预处理后的数据文件、训练后的模型文件所在路径。
			
 
				-   -  **4** 版本/作者信息。可以多多支持一下作者。
			
 
				-   -  **5** 欢迎更好的动效~
			
 
				+    想启动 推理 WebUI 界面？编辑项目根目录下的 `API_FLAGS.txt`, 前三行修改成如下格式:
			
 
				+    ```
			
 
				+    --infer
			
 
				+    # --api
			
 
				+    # --listen ...
			
 
				+    ...
			
 
				+    ```
			
 
				 
			
 
				 !!! info "可选"
			
 
				 
			
 
				     想启动 API 服务器？编辑项目根目录下的 `API_FLAGS.txt`, 前三行修改成如下格式:
			
 
				     ```
			
 
				     # --infer
			
 
				-        --api
			
 
				-        --listen ...
			
 
				-        ...
			
 
				+    --api
			
 
				+    --listen ...
			
 
				+    ...
			
 
				     ```
			
 
				 
			
 
				 !!! info "可选"
			
--- a/docs/zh/inference.md
+++ b/docs/zh/inference.md
@@ -58,10 +58,7 @@ python tools/llama/generate.py \
 
				 !!! info
			
 
				     对于不支持 bf16 的 GPU, 你可能需要使用 `--half` 参数.
			
 
				 
			
 
				-<<<<<<< HEAD
			
 
				-=======
			
 
				 ### 3. 从语义 token 生成人声:
			
 
				->>>>>>> upstream/main
			
 
				 
			
 
				 #### VQGAN 解码
			
 
				 
			
@@ -81,11 +78,12 @@ python -m tools.api \
 
				     --llama-checkpoint-path "checkpoints/fish-speech-1.2-sft" \
			
 
				     --decoder-checkpoint-path "checkpoints/fish-speech-1.2-sft/firefly-gan-vq-fsq-4x1024-42hz-generator.pth" \
			
 
				     --decoder-config-name firefly_gan_vq
			
 
				+```
			
 
				+如果你想要加速推理，可以加上`--compile`参数。
			
 
				 
			
 
				-如果你想要加速推理，可以加上--compile参数。
			
 
				-
			
 
				-# 推荐中国大陆用户运行以下命令来启动 HTTP 服务:
			
 
				-HF_ENDPOINT=https://hf-mirror.com python -m ...
			
 
				+推荐中国大陆用户运行以下命令来启动 HTTP 服务:
			
 
				+```bash
			
 
				+HF_ENDPOINT=https://hf-mirror.com python -m ...(同上)
			
 
				 ```
			
 
				 
			
 
				 随后, 你可以在 `http://127.0.0.1:8080/` 中查看并测试 API.
			
--- a/inference.ipynb
+++ b/inference.ipynb
@@ -11,7 +11,7 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "### For Windows"
			
 
				+    "### For Windows User / win用户"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -31,7 +31,7 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "### For Linux"
			
 
				+    "### For Linux User / Linux 用户"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -96,9 +96,11 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "### 1. Encode reference audio\n",
			
 
				+    "### 1. Encode reference audio: / 从语音生成 prompt: \n",
			
 
				     "\n",
			
 
				-    "You should get a `fake.npy` by doing this."
			
 
				+    "You should get a `fake.npy` file.\n",
			
 
				+    "\n",
			
 
				+    "你应该能得到一个 `fake.npy` 文件."
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -127,10 +129,15 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "### 2. Generate semantic tokens from text:\n",
			
 
				-    "> This command will create codes_N files in the working directory, where N is an integer starting from 0.\n",
			
 
				+    "### 2. Generate semantic tokens from text: / 从文本生成语义 token:\n",
			
 
				+    "\n",
			
 
				+    "> This command will create a codes_N file in the working directory, where N is an integer starting from 0.\n",
			
 
				+    "\n",
			
 
				+    "> You may want to use `--compile` to fuse CUDA kernels for faster inference (~30 tokens/second -> ~300 tokens/second).\n",
			
 
				+    "\n",
			
 
				+    "> 该命令会在工作目录下创建 codes_N 文件, 其中 N 是从 0 开始的整数.\n",
			
 
				     "\n",
			
 
				-    "> You can use --compile to fuse CUDA kernels for faster inference."
			
 
				+    "> 您可以使用 `--compile` 来融合 cuda 内核以实现更快的推理 (~30 tokens/秒 -> ~300 tokens/秒)"
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -156,7 +163,7 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "### 3. Generate speech from semantic tokens:"
			
 
				+    "### 3. Generate speecj from semantic tokens: / 从语义 token 生成人声:"
			
 
				    ]
			
 
				   },
			
 
				   {