Explorar el Código

Optimize docs

Lengyue hace 2 años
padre
commit
c0585bff0f
Se han modificado 12 ficheros con 142 adiciones y 36 borrados
  1. 5 2
      README.md
  2. 63 3
      docs/en/finetune.md
  3. 1 0
      docs/en/index.md
  4. 25 1
      docs/en/inference.md
  5. 1 1
      docs/en/samples.md
  6. 0 4
      docs/index.md
  7. 2 0
      docs/requirements.txt
  8. 0 3
      docs/stylesheets/extra.css
  9. 12 10
      docs/zh/finetune.md
  10. 2 1
      docs/zh/index.md
  11. 1 1
      docs/zh/inference.md
  12. 30 10
      mkdocs.yml

+ 5 - 2
README.md

@@ -6,12 +6,15 @@ This codebase is released under BSD-3-Clause License, and all models are release
 
 ## Disclaimer / 免责声明
 We do not hold any responsibility for any illegal usage of the codebase. Please refer to your local laws about DMCA and other related laws.  
-我们不对代码库的任何非法使用承担任何责任. 请参阅您当地关于 DMCA (数字千年法案) 和其他相关法律的法律.
+我们不对代码库的任何非法使用承担任何责任. 请参阅您当地关于 DMCA (数字千年法案) 和其他相关法律法规.
 
 ## Documents / 文档
 - [English](https://speech.fish.audio/en/)
-- [中文](https://speech.fish.audio/zh/)
+- [中文](https://speech.fish.audio/)
 
+## Samples / 例子
+- [English](https://speech.fish.audio/en/samples/)
+- [中文](https://speech.fish.audio/samples/)
 
 ## Credits / 鸣谢
 - [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)

+ 63 - 3
docs/en/finetune.md

@@ -2,7 +2,63 @@
 
 Obviously, when you opened this page, you were not satisfied with the performance of the few-shot pre-trained model. You want to fine-tune a model to improve its performance on your dataset.
 
-`Fish Speech` consists of two modules: `VQGAN` and `LLAMA`. Currently, we only support fine-tuning the `LLAMA` model.
+`Fish Speech` consists of two modules: `VQGAN` and `LLAMA`.
+
+!!! info 
+    You should first conduct the following test to determine if you need to fine-tune `VQGAN`:
+    ```bash
+    python tools/vqgan/inference.py -i test.wav
+    ```
+    This test will generate a `fake.wav` file. If the timbre of this file differs from the speaker's original voice, or if the quality is not high, you need to fine-tune `VQGAN`.
+
+    Similarly, you can refer to [Inference](inference.md) to run `generate.py` and evaluate if the prosody meets your expectations. If it does not, then you need to fine-tune `LLAMA`.
+
+## Fine-tuning VQGAN
+### 1. Prepare the Dataset
+
+```
+.
+├── SPK1
+│   ├── 21.15-26.44.mp3
+│   ├── 27.51-29.98.mp3
+│   └── 30.1-32.71.mp3
+└── SPK2
+    └── 38.79-40.85.mp3
+```
+
+You need to format your dataset as shown above and place it under `data/demo`. Audio files can have `.mp3`, `.wav`, or `.flac` extensions.
+
+### 2. Split Training and Validation Sets
+
+```bash
+python tools/vqgan/create_train_split.py data/demo
+```
+
+This command will create `data/demo/vq_train_filelist.txt` and `data/demo/vq_val_filelist.txt` in the `data/demo` directory, to be used for training and validation respectively.
+
+!!!info
+    For the VITS format, you can specify a file list using `--filelist xxx.list`.
+    Please note that the audio files in `filelist` must also be located in the `data/demo` folder.
+
+### 3. Start Training
+
+```bash
+python fish_speech/train.py --config-name vqgan_finetune
+```
+
+!!! note
+    You can modify training parameters by editing `fish_speech/configs/vqgan_finetune.yaml`, but in most cases, this won't be necessary.
+
+### 4. Test the Audio
+    
+```bash
+python tools/vqgan/inference.py -i test.wav --checkpoint-path results/vqgan_finetune/checkpoints/step_000010000.ckpt
+```
+
+You can review `fake.wav` to assess the fine-tuning results.
+
+!!! note
+    You may also try other checkpoints. We suggest using the earliest checkpoint that meets your requirements, as they often perform better on out-of-distribution (OOD) data.
 
 ## Fine-tuning LLAMA
 ### 1. Prepare the dataset
@@ -26,7 +82,7 @@ You need to convert your dataset into the above format and place it under `data/
 !!! note
     You can modify the dataset path and mix datasets by modifying `fish_speech/configs/data/finetune.yaml`.
 
-### 2. Batch-wise extraction of semantic tokens
+### 2. Batch extraction of semantic tokens
 
 Make sure you have downloaded the VQGAN weights. If not, run the following command:
 
@@ -76,6 +132,10 @@ python tools/llama/build_dataset.py \
 
 After the command finishes executing, you should see the `quantized-dataset-ft.protos` file in the `data` directory.
 
+!!!info
+    For the VITS format, you can specify a file list using `--filelist xxx.list`.
+    Please note that the audio files referenced in `filelist` must also be located in the `data/demo` folder.
+
 ### 4. Start the Rust data server
 
 Loading and shuffling the dataset is very slow and memory-consuming. Therefore, we use a Rust server to load and shuffle the data. This server is based on GRPC and can be installed using the following method:
@@ -112,7 +172,7 @@ python fish_speech/train.py --config-name text2semantic_finetune_spk
 !!! note
     You can modify the training parameters such as `batch_size`, `gradient_accumulation_steps`, etc. to fit your GPU memory by modifying `fish_speech/configs/text2semantic_finetune_spk.yaml`.
 
-After training is complete, you can refer to the inference section to generate speech.
+After training is complete, you can refer to the [inference](inference.md) section, and use `--speaker SPK1` to generate speech.
 
 !!! info
     By default, the model will only learn the speaker's speech patterns and not the timbre. You still need to use prompts to ensure timbre stability.

+ 1 - 0
docs/en/index.md

@@ -33,6 +33,7 @@ pip3 install -e .
 
 ## Changelog
 
+- 2023/12/19: Updated webui and HTTP API.
 - 2023/12/18: Updated fine-tuning documentation and related examples.
 - 2023/12/17: Updated `text2semantic` model, supporting phoneme-free mode.
 - 2023/12/13: Beta version released, includes VQGAN model and a language model based on LLAMA (phoneme support only).

+ 25 - 1
docs/en/inference.md

@@ -1,6 +1,6 @@
 # Inference
 
-In the plan, inference is expected to support both command line and webui methods, but currently, only the command-line reasoning function has been completed.  
+Inference support command line, HTTP API and web UI.
 
 !!! note
     Overall, reasoning consists of several parts:
@@ -57,3 +57,27 @@ python tools/vqgan/inference.py \
     -i "codes_0.npy" \
     --checkpoint-path "checkpoints/vqgan-v1.pth"
 ```
+
+## HTTP API Inference
+
+We provide a HTTP API for inference. You can use the following command to start the server:
+
+```bash
+python -m zibai tools.api_server:app --listen 127.0.0.1:8000
+```
+
+After that, you can view and test the API at http://127.0.0.1:8000/docs.  
+
+Generally, you need to first call PUT /v1/models/default to load the model, and then use POST /v1/models/default/invoke for inference. For specific parameters, please refer to the API documentation.
+
+## WebUI Inference
+
+Before running the WebUI, you need to start the HTTP service as described above.
+
+Then you can start the WebUI using the following command:
+
+```bash
+python fish_speech/webui/app.py
+```
+
+Enjoy!

+ 1 - 1
docs/en/samples.md

@@ -1,4 +1,4 @@
-# Example
+# Samples
 
 !!! note
     Due to insufficient Japanese to English training data, we first phonemicize the text and then use it for generation.

+ 0 - 4
docs/index.md

@@ -1,4 +0,0 @@
----
-template: redirect.html
-location: ./zh/
----

+ 2 - 0
docs/requirements.txt

@@ -1 +1,3 @@
 mkdocs-material
+mkdocs-static-i18n[material]
+mkdocs[i18n]

+ 0 - 3
docs/stylesheets/extra.css

@@ -1,6 +1,3 @@
 .md-grid {
   max-width: 1440px; 
 }
-.md-tabs {
-  display: none;
-}

+ 12 - 10
docs/zh/finetune.md

@@ -11,7 +11,7 @@
     ```
     该测试会生成一个 `fake.wav` 文件, 如果该文件的音色和说话人的音色不同, 或者质量不高, 你需要微调 `VQGAN`.
 
-    相应的, 你可以参考 [推理](../inference/) 来运行 `generate.py`, 判断韵律是否满意, 如果不满意, 则需要微调 `LLAMA`.
+    相应的, 你可以参考 [推理](inference.md) 来运行 `generate.py`, 判断韵律是否满意, 如果不满意, 则需要微调 `LLAMA`.
 
 ## VQGAN 微调
 ### 1. 准备数据集
@@ -19,18 +19,14 @@
 ```
 .
 ├── SPK1
-│   ├── 21.15-26.44.lab
 │   ├── 21.15-26.44.mp3
-│   ├── 27.51-29.98.lab
 │   ├── 27.51-29.98.mp3
-│   ├── 30.1-32.71.lab
 │   └── 30.1-32.71.mp3
 └── SPK2
-    ├── 38.79-40.85.lab
     └── 38.79-40.85.mp3
 ```
 
-你需要将数据集转为以上格式, 并放到 `data/demo` 下, 音频后缀可以为 `.mp3`, `.wav` 或 `.flac`, 标注文件后缀可以为 `.lab` 或 `.txt`.
+你需要将数据集转为以上格式, 并放到 `data/demo` 下, 音频后缀可以为 `.mp3`, `.wav` 或 `.flac`.
 
 ### 2. 分割训练集和验证集
 
@@ -38,7 +34,6 @@
 python tools/vqgan/create_train_split.py data/demo
 ```
 
-
 该命令会在 `data/demo` 目录下创建 `data/demo/vq_train_filelist.txt` 和 `data/demo/vq_val_filelist.txt` 文件, 分别用于训练和验证.  
 
 !!!info
@@ -100,10 +95,13 @@ python tools/vqgan/inference.py -i test.wav --checkpoint-path results/vqgan_fine
 ```bash
 huggingface-cli download fishaudio/speech-lm-v1 vqgan-v1.pth --local-dir checkpoints
 ```
-对于中国大陆用户,可使用mirror下载。
+
+对于中国大陆用户, 可使用 mirror 下载.
+
 ```bash
 HF_ENDPOINT=https://hf-mirror.com huggingface-cli download fishaudio/speech-lm-v1 vqgan-v1.pth --local-dir checkpoints
 ```
+
 随后可运行以下命令来提取语义 token:
 
 ```bash
@@ -178,11 +176,15 @@ data_server/target/release/data_server \
 ```bash
 huggingface-cli download fishaudio/speech-lm-v1 text2semantic-400m-v0.2-4k.pth --local-dir checkpoints
 ```
-对于中国大陆用户,可使用mirror下载。
+
+对于中国大陆用户, 可使用 mirror 下载.
+
 ```bash
 HF_ENDPOINT=https://hf-mirror.com huggingface-cli download fishaudio/speech-lm-v1 text2semantic-400m-v0.2-4k.pth --local-dir checkpoints
 ```
+
 最后, 你可以运行以下命令来启动微调:
+
 ```bash
 python fish_speech/train.py --config-name text2semantic_finetune_spk
 ```
@@ -190,7 +192,7 @@ python fish_speech/train.py --config-name text2semantic_finetune_spk
 !!! note
     你可以通过修改 `fish_speech/configs/text2semantic_finetune_spk.yaml` 来修改训练参数如 `batch_size`, `gradient_accumulation_steps` 等, 来适应你的显存.
 
-训练结束后, 你可以参考推理部分, 并携带 `--speaker SPK1` 参数来测试你的模型.
+训练结束后, 你可以参考 [推理](inference.md) 部分, 并携带 `--speaker SPK1` 参数来测试你的模型.
 
 !!! info
     默认配置下, 基本只会学到说话人的发音方式, 而不包含音色, 你依然需要使用 prompt 来保证音色的稳定性.  

+ 2 - 1
docs/zh/index.md

@@ -1,7 +1,7 @@
 # 介绍
 
 !!! warning
-    我们不对代码库的任何非法使用承担任何责任. 请参阅您当地关于 DMCA (数字千年法案) 和其他相关法律的法律.
+    我们不对代码库的任何非法使用承担任何责任. 请参阅您当地关于 DMCA (数字千年法案) 和其他相关法律法规.
 
 此代码库根据 `BSD-3-Clause` 许可证发布, 所有模型根据 CC-BY-NC-SA-4.0 许可证发布.
 
@@ -33,6 +33,7 @@ pip3 install -e .
 
 ## 更新日志
 
+- 2023/12/19: 更新了 Webui 和 HTTP API.
 - 2023/12/18: 更新了微调文档和相关例子.
 - 2023/12/17: 更新了 `text2semantic` 模型, 支持无音素模式.
 - 2023/12/13: 测试版发布, 包含 VQGAN 模型和一个基于 LLAMA 的语言模型 (只支持音素).

+ 1 - 1
docs/zh/inference.md

@@ -1,6 +1,6 @@
 # 推理
 
-计划中, 推理支持命令行, http api, 以及 webui 三种方式.  
+推理支持命令行, http api, 以及 webui 三种方式.  
 
 !!! note
     总的来说, 推理分为几个部分:  

+ 30 - 10
mkdocs.yml

@@ -1,14 +1,23 @@
 site_name: Fish Speech
+site_description: Targeting SOTA TTS solutions.
+site_url: https://speech.fish.audio
+
+# Repository
+repo_name: fishaudio/fish-speech
 repo_url: https://github.com/fishaudio/fish-speech
+edit_uri: blob/main/docs
+
+# Copyright
+copyright: Copyright © 2023-2024 by Fish Audio
 
 theme:
   name: material
   language: en
   features:
-    - navigation.instant
-    - navigation.instant.prefetch
+    - content.action.edit
+    - content.action.view
     - navigation.tracking
-    - navigation.tabs
+    # - navigation.tabs
     - search
     - search.suggest
     - search.highlight
@@ -44,13 +53,24 @@ theme:
   
 extra:
   homepage: https://speech.fish.audio
-  alternate:
-    - name: English
-      link: /en/ 
-      lang: en
-    - name: 中文
-      link: /zh/
-      lang: zh
+
+# Plugins
+plugins:
+  - search:
+      separator: '[\s\-,:!=\[\]()"`/]+|\.(?!\d)|&[lg]t;|(?!\b)(?=[A-Z][a-z])'
+      lang:
+        - zh
+        - en
+  - i18n:
+      docs_structure: folder
+      languages:
+        - locale: en
+          name: English
+          build: true
+        - locale: zh
+          default: true
+          name: 简体中文
+          build: true
 
 markdown_extensions:
   - pymdownx.highlight: