Lengyue 2 лет назад
Родитель
Сommit
9bcaf3d02b
2 измененных файлов с 92 добавлено и 1 удалено
  1. 31 1
      README.md
  2. 61 0
      README.zh.md

+ 31 - 1
README.md

@@ -1,6 +1,17 @@
 # Fish Speech
 # Fish Speech
 
 
-This repo is still under construction. Please check back later.
+[中文文档](README.zh.md)
+
+This codebase is released under BSD-3-Clause License, and all models are released under CC-BY-NC-SA-4.0 License. Please refer to [LICENSE](LICENSE) for more details. 
+
+## Disclaimer
+We do not hold any responsibility for any illegal usage of the codebase. Please refer to your local laws about DMCA and other related laws.
+
+## Requirements
+- GPU memory: 4GB (for inference), 24GB (for finetuning)
+- System: Linux (full functionality), Windows (inference only, flash-attn is not supported, torch.compile is not supported)
+
+Therefore, we strongly recommend to use WSL2 or docker to run the codebase for Windows users.
 
 
 ## Setup
 ## Setup
 ```bash
 ```bash
@@ -16,6 +27,25 @@ pip3 install ninja && MAX_JOBS=4 pip3 install flash-attn --no-build-isolation
 pip3 install -e .
 pip3 install -e .
 ```
 ```
 
 
+## Inference (CLI)
+Download required `vqgan` and `text2semantic` model from our huggingface repo.
+
+```bash
+TODO
+```
+
+Generate semantic tokens from text:
+```bash
+python tools/llama/generate.py
+```
+
+You may want to use `--compile` to fuse cuda kernels faster inference (~25 tokens/sec -> ~300 tokens/sec).
+
+Generate vocals from semantic tokens:
+```bash
+python tools/vqgan/inference.py -i codes_0.npy
+```
+
 ## Rust Data Server
 ## Rust Data Server
 Since loading and shuffle the dataset is very slow and memory consuming, we use a rust server to load and shuffle the dataset. The server is based on GRPC and can be installed by
 Since loading and shuffle the dataset is very slow and memory consuming, we use a rust server to load and shuffle the dataset. The server is based on GRPC and can be installed by
 
 

+ 61 - 0
README.zh.md

@@ -0,0 +1,61 @@
+# Fish Speech
+
+此代码库根据 BSD-3-Clause 许可证发布,所有模型根据 CC-BY-NC-SA-4.0 许可证发布。请参阅 [LICENSE](LICENSE) 了解更多细节。
+
+## 免责声明
+我们不对代码库的任何非法使用承担任何责任。请参阅您当地关于DMCA和其他相关法律的法律。
+
+## 要求
+- GPU内存:4GB(用于推理),24GB(用于微调)
+- 系统:Linux(全部功能),Windows(仅推理,不支持flash-attn,不支持torch.compile)
+
+因此,我们强烈建议Windows用户使用WSL2或docker来运行代码库。
+
+## 设置
+```bash
+# 基本环境设置
+conda create -n fish-speech python=3.10
+conda activate fish-speech
+conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
+
+# 安装 flash-attn(适用于linux)
+pip3 install ninja && MAX_JOBS=4 pip3 install flash-attn --no-build-isolation
+
+# 安装 fish-speech
+pip3 install -e .
+```
+
+## 推理(CLI)
+
+从我们的 huggingface 仓库下载所需的 `vqgan` 和 `text2semantic` 模型。
+    
+```bash
+TODO
+```
+
+从文本生成语义 token:
+```bash
+python tools/llama/generate.py
+```
+
+您可能希望使用 `--compile` 来融合 cuda 内核以实现更快的推理(~25 个 token/秒 -> ~300 个 token/秒)。
+
+从语义 token 生成人声:
+```bash
+python tools/vqgan/inference.py -i codes_0.npy
+```
+
+## Rust 数据服务器
+由于加载和洗牌数据集非常缓慢且占用内存,因此我们使用 rust 服务器来加载和洗牌数据集。该服务器基于 GRPC,可以通过以下方式安装
+
+```bash
+cd data_server
+cargo build --release
+```
+
+## 致谢
+- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
+- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
+- [GPT VITS](https://github.com/innnky/gpt-vits)
+- [MQTTS](https://github.com/b04901014/MQTTS)
+- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)