瀏覽代碼

S2 beta (#1166)

* Update to support S2 model.

* fix gradio webui bug.

* update docs and license for S2 Model.

* Fix torch.compile and DAC bugs.

* Fix LICENSE.

* fix pyproject.toml bug.

* [fix]:fix hf style ckpy load problem

* [fix]:fix docker and docs

* [docs]:Add docs

* [docs]:update readme and docs

* fix typo

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [fix]fix readme

* [docs]:fix typo

* [docs]: bold sglang server

---------

Co-authored-by: PoTaTo-Mika <1228427403@qq.com>
Co-authored-by: Whale-Dolphin <whaledolphin@fish.audio>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: PoTaTo <148920650+PoTaTo-Mika@users.noreply.github.com>
Leng Yue 1 月之前
父節點
當前提交
e5b8236173
共有 12 個文件被更改,包括 93 次插入18 次删除
  1. 2 1
      README.md
  2. 2 1
      docs/README.ar.md
  3. 2 1
      docs/README.ja.md
  4. 2 1
      docs/README.ko.md
  5. 2 1
      docs/README.pt-BR.md
  6. 2 1
      docs/README.zh.md
  7. 14 2
      docs/ar/index.md
  8. 10 1
      docs/en/index.md
  9. 14 2
      docs/ja/index.md
  10. 14 2
      docs/ko/index.md
  11. 14 2
      docs/pt/index.md
  12. 15 3
      docs/zh/index.md

+ 2 - 1
README.md

@@ -59,7 +59,8 @@ Here are the official documents for Fish Audio S2, follow the instructions to ge
 - [Server Inference](https://speech.fish.audio/server/)
 - [Docker Setup](https://speech.fish.audio/install/#docker-setup)
 
-For SGLang server, please read [SGLang-Omni README](https://github.com/sgl-project/sglang-omni/blob/main/sglang_omni/models/fishaudio_s2_pro/README.md.)
+> [!IMPORTANT]
+> **For SGLang server, please read [SGLang-Omni README](https://github.com/sgl-project/sglang-omni/blob/main/sglang_omni/models/fishaudio_s2_pro/README.md).**
 
 ### For LLM Agent
 

+ 2 - 1
docs/README.ar.md

@@ -59,7 +59,8 @@
 - [الاستدلال عبر الخادم](https://speech.fish.audio/ar/server/)
 - [إعداد Docker](https://speech.fish.audio/ar/install/)
 
-بالنسبة لخادم SGLang، راجع [SGLang-Omni README](https://github.com/sgl-project/sglang-omni/blob/main/sglang_omni/models/fishaudio_s2_pro/README.md.).
+> [!IMPORTANT]
+> **بالنسبة لخادم SGLang، راجع [SGLang-Omni README](https://github.com/sgl-project/sglang-omni/blob/main/sglang_omni/models/fishaudio_s2_pro/README.md).**
 
 ### دليل وكلاء LLM
 

+ 2 - 1
docs/README.ja.md

@@ -59,7 +59,8 @@ Fish Audio S2 の公式ドキュメントです。以下からすぐに始めら
 - [サーバー推論](https://speech.fish.audio/ja/server/)
 - [Docker セットアップ](https://speech.fish.audio/ja/install/)
 
-SGLang サーバーについては [SGLang-Omni README](https://github.com/sgl-project/sglang-omni/blob/main/sglang_omni/models/fishaudio_s2_pro/README.md.) を参照してください。
+> [!IMPORTANT]
+> **SGLang サーバーについては [SGLang-Omni README](https://github.com/sgl-project/sglang-omni/blob/main/sglang_omni/models/fishaudio_s2_pro/README.md) を参照してください。**
 
 ### LLM Agent 向け
 

+ 2 - 1
docs/README.ko.md

@@ -59,7 +59,8 @@ Fish Audio S2 공식 문서입니다. 아래 링크에서 바로 시작할 수 
 - [서버 추론](https://speech.fish.audio/ko/server/)
 - [Docker 설정](https://speech.fish.audio/ko/install/)
 
-SGLang 서버는 [SGLang-Omni README](https://github.com/sgl-project/sglang-omni/blob/main/sglang_omni/models/fishaudio_s2_pro/README.md.)를 참고하세요.
+> [!IMPORTANT]
+> **SGLang 서버는 [SGLang-Omni README](https://github.com/sgl-project/sglang-omni/blob/main/sglang_omni/models/fishaudio_s2_pro/README.md)를 참고하세요.**
 
 ### LLM Agent 가이드
 

+ 2 - 1
docs/README.pt-BR.md

@@ -59,7 +59,8 @@ Esta é a documentação oficial do Fish Audio S2. Você pode começar por aqui:
 - [Inferência via Servidor](https://speech.fish.audio/pt/server/)
 - [Configuração Docker](https://speech.fish.audio/pt/install/)
 
-Para servidor com SGLang, consulte o [SGLang-Omni README](https://github.com/sgl-project/sglang-omni/blob/main/sglang_omni/models/fishaudio_s2_pro/README.md.).
+> [!IMPORTANT]
+> **Para servidor com SGLang, consulte o [SGLang-Omni README](https://github.com/sgl-project/sglang-omni/blob/main/sglang_omni/models/fishaudio_s2_pro/README.md).**
 
 ### Guia para agentes LLM
 

+ 2 - 1
docs/README.zh.md

@@ -60,7 +60,8 @@
 - [服务端推理](https://speech.fish.audio/zh/server/)
 - [Docker 部署](https://speech.fish.audio/zh/install/)
 
-如需 SGLang Server,请参考 [SGLang-Omni README](https://github.com/sgl-project/sglang-omni/blob/main/sglang_omni/models/fishaudio_s2_pro/README.md.)。
+> [!IMPORTANT]
+> **如需使用 SGLang Server,请参考 [SGLang-Omni README](https://github.com/sgl-project/sglang-omni/blob/main/sglang_omni/models/fishaudio_s2_pro/README.md)。**
 
 ### LLM Agent 指南
 

+ 14 - 2
docs/ar/index.md

@@ -52,7 +52,19 @@
 هذا هو الوثائق الرسمية لـ Fish Speech. يرجى اتباع التعليمات للبدء بسهولة.
 
 - [التثبيت](install.md)
-- [الاستنتاج](inference.md)
+- [الاستدلال عبر سطر الأوامر](inference.md)
+- [استدلال WebUI](inference.md)
+- [الاستدلال عبر الخادم](server.md)
+- [إعداد Docker](install.md)
+
+!!! note
+    بالنسبة لخادم SGLang، راجع [SGLang-Omni README](https://github.com/sgl-project/sglang-omni/blob/main/sglang_omni/models/fishaudio_s2_pro/README.md).
+
+### دليل وكلاء LLM
+
+```text
+قم بتثبيت وإعداد Fish Audio S2 باتباع التعليمات في https://speech.fish.audio/ar/install/ .
+```
 
 ## Fish Audio S2
 **أفضل نظام لتحويل النص إلى كلام في كل من المصادر المفتوحة والمغلقة**
@@ -70,7 +82,6 @@ Fish Audio S2 هو أحدث نموذج تم تطويره بواسطة [Fish Audi
 | النموذج | الحجم | التوفر | الوصف |
 |------|------|-------------|-------------|
 | S2-Pro | 4B معاملات | [huggingface](https://huggingface.co/fishaudio/s2-pro) | نموذج رائد بكامل الميزات مع أعلى جودة واستقرار |
-| S2-Flash | - - - - | [fish.audio](https://fish.audio/) | نموذجنا المغلق المصدر بسرعات أعلى وزمن وصول أقل |
 
 لمزيد من التفاصيل حول النماذج ، يرجى مراجعة التقرير الفني.
 
@@ -105,6 +116,7 @@ Fish Audio S2 هو أحدث نموذج تم تطويره بواسطة [Fish Audi
 ### استنساخ الصوت السريع
 
 يدعم Fish Audio S2 استنساخ الصوت الدقيق باستخدام عينات مرجعية قصيرة (عادة 10-30 ثانية). يمكن للنموذج التقاط نبرة الصوت وأسلوب التحدث والميل العاطفي ، وتوليد أصوات مستنسخة واقعية ومتسقة دون ضبط دقيق إضافي.
+لاستخدام خادم SGLang، راجع https://github.com/sgl-project/sglang-omni/blob/main/sglang_omni/models/fishaudio_s2_pro/README.md .
 
 ---
 

+ 10 - 1
docs/en/index.md

@@ -57,6 +57,15 @@ This is the official documentation for Fish Speech. Please follow the instructio
 - [Server Inference](server.md)
 - [Docker Setup](install.md#docker-setup)
 
+!!! note
+    For SGLang server, please read [SGLang-Omni README](https://github.com/sgl-project/sglang-omni/blob/main/sglang_omni/models/fishaudio_s2_pro/README.md).
+
+### For LLM Agent
+
+```text
+Install and configure Fish-Audio S2 by following the instructions here: https://speech.fish.audio/install/
+```
+
 ## Fish Audio S2
 **The best text-to-speech system in both open-source and closed-source**
 
@@ -73,7 +82,6 @@ Please visit the [Fish Audio website](https://fish.audio/) for a real-time exper
 | Model | Size | Availability | Description |
 |------|------|-------------|-------------|
 | S2-Pro | 4B Parameters | [huggingface](https://huggingface.co/fishaudio/s2-pro) | Full-featured flagship model with the highest quality and stability |
-| S2-Flash | - - - - | [fish.audio](https://fish.audio/) | Our closed-source model with faster speed and lower latency |
 
 For more details on the models, please see the technical report.
 
@@ -108,6 +116,7 @@ Thanks to the expansion of the model's context, our model can now use the inform
 ### Fast Voice Cloning
 
 Fish Audio S2 supports accurate voice cloning using short reference samples (typically 10-30 seconds). The model can capture timbre, speaking style, and emotional tendency, generating realistic and consistent cloned voices without additional fine-tuning.
+Please refer to https://github.com/sgl-project/sglang-omni/blob/main/sglang_omni/models/fishaudio_s2_pro/README.md to use the SGLang server.
 
 ---
 

+ 14 - 2
docs/ja/index.md

@@ -52,7 +52,19 @@
 これは Fish Speech の公式ドキュメントです。説明に従って簡単に使い始めることができます。
 
 - [インストール](install.md)
-- [推論](inference.md)
+- [コマンドライン推論](inference.md)
+- [WebUI 推論](inference.md)
+- [サーバー推論](server.md)
+- [Docker セットアップ](install.md)
+
+!!! note
+    SGLang サーバーについては [SGLang-Omni README](https://github.com/sgl-project/sglang-omni/blob/main/sglang_omni/models/fishaudio_s2_pro/README.md) を参照してください。
+
+### LLM Agent 向け
+
+```text
+https://speech.fish.audio/ja/install/ の手順に従って、Fish Audio S2 をインストール・設定してください。
+```
 
 ## Fish Audio S2
 **オープンソースおよびクローズドソースの中で最高峰のテキスト読み上げシステム**
@@ -70,7 +82,6 @@ S2 シリーズには複数のモデルが含まれており、オープンソ
 | モデル | サイズ | 利用可能性 | 説明 |
 |------|------|-------------|-------------|
 | S2-Pro | 4B パラメータ | [huggingface](https://huggingface.co/fishaudio/s2-pro) | 最高品質と安定性を備えたフル機能のフラッグシップモデル |
-| S2-Flash | - - - - | [fish.audio](https://fish.audio/) | より高速で低遅延のクローズドソースモデル |
 
 モデルの詳細については、技術レポートを参照してください。
 
@@ -105,6 +116,7 @@ Fish Audio S2 では、ユーザーが複数の話者を含むリファレンス
 ### 高速音声クローン
 
 Fish Audio S2 は、短いリファレンスサンプル(通常 10〜30 秒)を使用した正確な音声クローンをサポートしています。モデルは音色、話し方、感情的な傾向を捉えることができ、追加の微調整なしでリアルで一貫したクローン音声を生成できます。
+SGLang サーバーの利用については https://github.com/sgl-project/sglang-omni/blob/main/sglang_omni/models/fishaudio_s2_pro/README.md を参照してください。
 
 ---
 

+ 14 - 2
docs/ko/index.md

@@ -52,7 +52,19 @@
 Fish Speech의 공식 문서입니다. 지침에 따라 쉽게 시작할 수 있습니다.
 
 - [설치](install.md)
-- [추론](inference.md)
+- [커맨드라인 추론](inference.md)
+- [WebUI 추론](inference.md)
+- [서버 추론](server.md)
+- [Docker 설정](install.md)
+
+!!! note
+    SGLang 서버는 [SGLang-Omni README](https://github.com/sgl-project/sglang-omni/blob/main/sglang_omni/models/fishaudio_s2_pro/README.md)를 참고하세요.
+
+### LLM Agent 가이드
+
+```text
+https://speech.fish.audio/ko/install/ 문서를 따라 Fish Audio S2를 설치하고 구성하세요.
+```
 
 ## Fish Audio S2
 **오픈 소스 및 클로즈드 소스 중 최고봉의 텍스트 음성 변환 시스템**
@@ -70,7 +82,6 @@ S2 시리즈에는 여러 모델이 포함되어 있으며, 오픈 소스 모델
 | 모델 | 크기 | 가용성 | 설명 |
 |------|------|-------------|-------------|
 | S2-Pro | 4B 매개변수 | [huggingface](https://huggingface.co/fishaudio/s2-pro) | 최고의 품질과 안정성을 갖춘 풀 기능 플래그십 모델 |
-| S2-Flash | - - - - | [fish.audio](https://fish.audio/) | 더 빠른 속도와 짧은 지연 시간을 갖춘 클로즈드 소스 모델 |
 
 모델에 대한 자세한 내용은 기술 보고서를 참조하십시오.
 
@@ -105,6 +116,7 @@ Fish Audio S2를 사용하면 사용자가 여러 화자가 포함된 참조 오
 ### 빠른 음성 클로닝
 
 Fish Audio S2는 짧은 참조 샘플(보통 10~30초)을 사용한 정확한 음성 클로닝을 지원합니다. 모델은 음색, 말하기 스타일 및 감정적 경향을 포착할 수 있으며, 추가 미세 조정 없이도 사실적이고 일관된 클로닝 음성을 생성할 수 있습니다.
+SGLang 서버 사용은 https://github.com/sgl-project/sglang-omni/blob/main/sglang_omni/models/fishaudio_s2_pro/README.md 를 참고하세요.
 
 ---
 

+ 14 - 2
docs/pt/index.md

@@ -52,7 +52,19 @@
 Esta é a documentação oficial do Fish Speech. Siga as instruções para começar facilmente.
 
 - [Instalação](install.md)
-- [Inferência](inference.md)
+- [Inferência por Linha de Comando](inference.md)
+- [Inferência WebUI](inference.md)
+- [Inferência via Servidor](server.md)
+- [Configuração Docker](install.md)
+
+!!! note
+    Para servidor com SGLang, consulte o [SGLang-Omni README](https://github.com/sgl-project/sglang-omni/blob/main/sglang_omni/models/fishaudio_s2_pro/README.md).
+
+### Guia para agentes LLM
+
+```text
+Instale e configure o Fish Audio S2 seguindo as instruções em https://speech.fish.audio/pt/install/ .
+```
 
 ## Fish Audio S2
 **O melhor sistema de texto para fala em código aberto e código fechado**
@@ -70,7 +82,6 @@ Visite o [site da Fish Audio](https://fish.audio/) para uma experiência em temp
 | Modelo | Tamanho | Disponibilidade | Descrição |
 |------|------|-------------|-------------|
 | S2-Pro | 4B Parâmetros | [huggingface](https://huggingface.co/fishaudio/s2-pro) | Modelo emblemático completo com a mais alta qualidade e estabilidade |
-| S2-Flash | - - - - | [fish.audio](https://fish.audio/) | Nosso modelo de código fechado com maior velocidade e menor latência |
 
 Para mais detalhes sobre os modelos, consulte o relatório técnico.
 
@@ -105,6 +116,7 @@ Graças à expansão do contexto do modelo, nosso modelo agora pode usar as info
 ### Clonagem de Voz Rápida
 
 O Fish Audio S2 suporta clonagem de voz precisa usando amostras de referência curtas (geralmente de 10 a 30 segundos). O modelo pode capturar timbre, estilo de fala e tendência emocional, gerando vozes clonadas realistas e consistentes sem ajuste fino adicional.
+Para usar o servidor SGLang, consulte https://github.com/sgl-project/sglang-omni/blob/main/sglang_omni/models/fishaudio_s2_pro/README.md .
 
 ---
 

+ 15 - 3
docs/zh/index.md

@@ -52,7 +52,19 @@
 这里是 Fish Speech 的官方文档,请按照说明轻松入门。
 
 - [安装](install.md)
-- [推理](inference.md)
+- [命令行推理](inference.md)
+- [WebUI 推理](inference.md)
+- [服务端推理](server.md)
+- [Docker 部署](install.md)
+
+!!! note
+    如需 SGLang Server,请参考 [SGLang-Omni README](https://github.com/sgl-project/sglang-omni/blob/main/sglang_omni/models/fishaudio_s2_pro/README.md)。
+
+### LLM Agent 指南
+
+```text
+请先阅读 https://speech.fish.audio/zh/install/ ,并按文档安装和配置 Fish Audio S2。
+```
 
 ## Fish Audio S2
 **开源和闭源中最出色的文本转语音系统**
@@ -69,8 +81,7 @@ S2 系列包含多个模型,开源模型为 S2-Pro,是该系列中性能最
 
 | 模型 | 大小 | 可用性 | 描述 |
 |------|------|-------------|-------------|
-| S2-Pro | 4B 参数 | [huggingface]() | 功能齐全的旗舰模型,具有最高质量和稳定性 |
-| S2-Flash | - - - - | [fish.audio](https://fish.audio/) | 我们的闭源模型,具有更快的速度和更低的延迟 |
+| S2-Pro | 4B 参数 | [huggingface](https://huggingface.co/fishaudio/s2-pro) | 功能齐全的旗舰模型,具有最高质量和稳定性 |
 
 有关模型的更多详情,请参见技术报告。
 
@@ -105,6 +116,7 @@ Fish Audio S2 允许用户上传包含多个说话人的参考音频,模型将
 ### 快速语音克隆
 
 Fish Audio S2 支持使用短参考样本(通常为 10-30 秒)进行准确的语音克隆。模型可以捕捉音色、说话风格和情感倾向,无需额外微调即可生成逼真且一致的克隆语音。
+如需使用 SGLang Server,请参考 https://github.com/sgl-project/sglang-omni/blob/main/sglang_omni/models/fishaudio_s2_pro/README.md 。
 
 ---