Procházet zdrojové kódy

Update README.md for OpenAudio-S1 (#998)

* [feature]add dataset classs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [dev]combine agent and tts infer

* [feature]:update inference

* [feature]:update uv.lock

* [Merge]:merge upstream/main

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [fix]:remove unused files

* [fix]:remove unused files

* [fix]:remove unused files

* [fix]:fix infer bugs

* [docs]:update introduction and optinize front appearence

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [docs]:update README for OpenAudio-S1

* [docs]:update docs

* [docs]:Update video

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Whale and Dolphin před 10 měsíci
rodič
revize
9efa2087bd
12 změnil soubory, kde provedl 922 přidání a 366 odebrání
  1. 121 38
      README.md
  2. 123 40
      docs/README.ja.md
  3. 123 40
      docs/README.ko.md
  4. 122 39
      docs/README.pt-BR.md
  5. 120 37
      docs/README.zh.md
  6. binární
      docs/assets/Elo.jpg
  7. binární
      docs/assets/Thumbnail.jpg
  8. 50 24
      docs/en/index.md
  9. 63 37
      docs/ja/index.md
  10. 73 36
      docs/ko/index.md
  11. 63 37
      docs/pt/index.md
  12. 64 38
      docs/zh/index.md

+ 121 - 38
README.md

@@ -26,76 +26,159 @@
     <a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
         <img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
     </a>
+    <a target="_blank" href="https://pd.qq.com/s/bwxia254o">
+      <img alt="QQ Channel" src="https://img.shields.io/badge/QQ-blue?logo=tencentqq">
+    </a>
+</div>
+
+<div align="center">
+    <a target="_blank" href="https://huggingface.co/spaces/TTS-AGI/TTS-Arena-V2">
+      <img alt="TTS-Arena2 Score" src="https://img.shields.io/badge/TTS_Arena2-Rank_%231-gold?style=flat-square&logo=trophy&logoColor=white">
+    </a>
     <a target="_blank" href="https://huggingface.co/spaces/fishaudio/fish-speech-1">
         <img alt="Huggingface" src="https://img.shields.io/badge/🤗%20-space%20demo-yellow"/>
     </a>
-    <a target="_blank" href="https://pd.qq.com/s/bwxia254o">
-      <img alt="QQ Channel" src="https://img.shields.io/badge/QQ-blue?logo=tencentqq">
+    <a target="_blank" href="https://huggingface.co/fishaudio/openaudio-s1-mini">
+        <img alt="HuggingFace Model" src="https://img.shields.io/badge/🤗%20-models-orange"/>
     </a>
 </div>
 
-This codebase is released under Apache License and all model weights are released under CC-BY-NC-SA-4.0 License. Please refer to [LICENSE](LICENSE) for more details.
+> [!IMPORTANT]
+> **License Notice**  
+> This codebase is released under **Apache License** and all model weights are released under **CC-BY-NC-SA-4.0 License**. Please refer to [LICENSE](LICENSE) for more details.
+
+> [!WARNING]
+> **Legal Disclaimer**  
+> We do not hold any responsibility for any illegal usage of the codebase. Please refer to your local laws about DMCA and other related laws.
+
+---
+
+## 🎉 Announcement
+
+We are excited to announce that we have rebranded to **OpenAudio** — introducing a revolutionary new series of advanced Text-to-Speech models that builds upon the foundation of Fish-Speech.
+
+We are proud to release **OpenAudio-S1** as the first model in this series, delivering significant improvements in quality, performance, and capabilities.
+
+OpenAudio-S1 comes in two versions: **OpenAudio-S1** and **OpenAudio-S1-mini**. Both models are now available on [Fish Audio Playground](https://fish.audio) (for **OpenAudio-S1**) and [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini) (for **OpenAudio-S1-mini**).
+
+Visit the [OpenAudio website](https://openaudio.com/blogs/s1) for blog & tech report.
+
+## Highlights ✨
+
+### **Excellent TTS quality**
+
+We use Seed TTS Eval Metrics to evaluate the model performance, and the results show that OpenAudio S1 achieves **0.008 WER** and **0.004 CER** on English text, which is significantly better than previous models. (English, auto eval, based on OpenAI gpt-4o-transcribe, speaker distance using Revai/pyannote-wespeaker-voxceleb-resnet34-LM)
+
+| Model | Word Error Rate (WER) | Character Error Rate (CER) | Speaker Distance |
+|-------|----------------------|---------------------------|------------------|
+| **S1** | **0.008**  | **0.004**  | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
+
+### **Best Model in TTS-Arena2** 🏆
+
+OpenAudio S1 has achieved the **#1 ranking** on [TTS-Arena2](https://arena.speechcolab.org/), the benchmark for text-to-speech evaluation:
+
+<div align="center">
+    <img src="docs/assets/Elo.jpg" alt="TTS-Arena2 Ranking" style="width: 75%;" />
+</div>
+
+### **Speech Control**
+
+OpenAudio S1 **supports a variety of emotional, tone, and special markers** to enhance speech synthesis:
+
+- **Basic emotions**:
+```
+(angry) (sad) (excited) (surprised) (satisfied) (delighted) 
+(scared) (worried) (upset) (nervous) (frustrated) (depressed)
+(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
+(grateful) (confident) (interested) (curious) (confused) (joyful)
+```
+
+- **Advanced emotions**:
+```
+(disdainful) (unhappy) (anxious) (hysterical) (indifferent) 
+(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
+(keen) (disapproving) (negative) (denying) (astonished) (serious)
+(sarcastic) (conciliative) (comforting) (sincere) (sneering)
+(hesitating) (yielding) (painful) (awkward) (amused)
+```
+
+- **Tone markers**:
+```
+(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
+```
+
+- **Special audio effects**:
+```
+(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
+(groaning) (crowd laughing) (background laughter) (audience laughing)
+```
+
+You can also use Ha,ha,ha to control, there's many other cases waiting to be explored by yourself.
+
+(Support for English, Chinese and Japanese now, and more languages is coming soon!)
 
-We are excited to announce that we have changed our name into OpenAudio, this will be a brand new series of Text-to-Speech model.
+### **Two Type of Models**
 
-Demo available at [Fish Audio Playground](https://fish.audio).
+| Model | Size | Availability | Features |
+|-------|------|--------------|----------|
+| **S1** | 4B parameters | Avaliable on [fish.audio](fish.audio) | Full-featured flagship model |
+| **S1-mini** | 0.5B parameters | Avaliable on huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) | Distilled version with core capabilities |
 
-Visit the [OpenAudio website](https://openaudio.com) for blog & tech report.
+Both S1 and S1-mini incorporate online Reinforcement Learning from Human Feedback (RLHF).
 
-## Features
-### OpenAudio-S1 (Fish-Speech's new verison)
+## **Features**
 
-1. This model has **ALL FEATURES** that fish-speech had.
+1. **Zero-shot & Few-shot TTS:** Input a 10 to 30-second vocal sample to generate high-quality TTS output. **For detailed guidelines, see [Voice Cloning Best Practices](https://docs.fish.audio/text-to-speech/voice-clone-best-practices).**
 
-2. OpenAudio S1 supports a variety of emotional, tone, and special markers to enhance speech synthesis:
-   
-   (angry) (sad) (disdainful) (excited) (surprised) (satisfied) (unhappy) (anxious) (hysterical) (delighted) (scared) (worried) (indifferent) (upset) (impatient) (nervous) (guilty) (scornful) (frustrated) (depressed) (panicked) (furious) (empathetic) (embarrassed) (reluctant) (disgusted) (keen) (moved) (proud) (relaxed) (grateful) (confident) (interested) (curious) (confused) (joyful) (disapproving) (negative) (denying) (astonished) (serious) (sarcastic) (conciliative) (comforting) (sincere) (sneering) (hesitating) (yielding) (painful) (awkward) (amused)
+2. **Multilingual & Cross-lingual Support:** Simply copy and paste multilingual text into the input box—no need to worry about the language. Currently supports English, Japanese, Korean, Chinese, French, German, Arabic, and Spanish.
 
-   Also supports tone marker:
+3. **No Phoneme Dependency:** The model has strong generalization capabilities and does not rely on phonemes for TTS. It can handle text in any language script.
 
-   (in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
+4. **Highly Accurate:** Achieves a low CER (Character Error Rate) of around 0.4% and WER (Word Error Rate) of around 0.8% for Seed-TTS Eval.
 
-    There's a few special markers that are supported:
+5. **Fast:** With fish-tech acceleration, the real-time factor is approximately 1:5 on an Nvidia RTX 4060 laptop and 1:15 on an Nvidia RTX 4090.
 
-    (laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting) (groaning) (crowd laughing) (background laughter) (audience laughing)
+6. **WebUI Inference:** Features an easy-to-use, Gradio-based web UI compatible with Chrome, Firefox, Edge, and other browsers.
 
-    You can also use **Ha,ha,ha** to control, there's many other cases waiting to be explored by yourself.
+7. **GUI Inference:** Offers a PyQt6 graphical interface that works seamlessly with the API server. Supports Linux, Windows, and macOS. [See GUI](https://github.com/AnyaCoder/fish-speech-gui).
 
-3. The OpenAudio S1 includes the following sizes:
--   **S1 (4B, proprietary):** The full-sized model.
--   **S1-mini (0.5B, open-sourced):** A distilled version of S1.
+8. **Deploy-Friendly:** Easily set up an inference server with native support for Linux, Windows (MacOS comming soon), minimizing speed loss.
 
-    Both S1 and S1-mini incorporate online Reinforcement Learning from Human Feedback (RLHF).
+## **Media & Demos**
 
-4. Evaluations
+<div align="center">
 
-    **Seed TTS Eval Metrics (English, auto eval, based on OpenAI gpt-4o-transcribe, speaker distance using Revai/pyannote-wespeaker-voxceleb-resnet34-LM):**
+### **Social Media**
+<a href="https://x.com/FishAudio/status/1929915992299450398" target="_blank">
+    <img src="https://img.shields.io/badge/𝕏-Latest_Demo-black?style=for-the-badge&logo=x&logoColor=white" alt="Latest Demo on X" />
+</a>
 
-    -   **S1:**
-        -   WER (Word Error Rate): **0.008**
-        -   CER (Character Error Rate): **0.004**
-        -   Distance: **0.332**
-    -   **S1-mini:**
-        -   WER (Word Error Rate): **0.011**
-        -   CER (Character Error Rate): **0.005**
-        -   Distance: **0.380**
-    
+### **Interactive Demos**
+<a href="https://fish.audio" target="_blank">
+    <img src="https://img.shields.io/badge/Fish_Audio-Try_OpenAudio_S1-blue?style=for-the-badge" alt="Try OpenAudio S1" />
+</a>
+<a href="https://huggingface.co/spaces/fishaudio/openaudio-s1-mini" target="_blank">
+    <img src="https://img.shields.io/badge/Hugging_Face-Try_S1_Mini-yellow?style=for-the-badge" alt="Try S1 Mini" />
+</a>
 
-## Disclaimer
+### **Video Showcases**
+<iframe width="560" height="315" src="https://www.youtube.com/embed/SYuPvd7m06A" title="OpenAudio S1 Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
 
-We do not hold any responsibility for any illegal usage of the codebase. Please refer to your local laws about DMCA and other related laws.
+### **Audio Samples**
+<div style="margin: 20px 0;">
+    <em> High-quality audio samples will be available soon, demonstrating our multilingual TTS capabilities across different languages and emotions.</em>
+</div>
 
-## Videos
+</div>
 
-#### To be continued.
+---
 
 ## Documents
 
 - [Build Envrionment](docs/en/install.md)
 - [Inference](docs/en/inference.md)
 
-It should be noted that the current model **DOESN'T SUPPORT FINETUNE**.
-
 ## Credits
 
 - [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)

+ 123 - 40
docs/README.ja.md

@@ -26,75 +26,157 @@
     <a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
         <img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
     </a>
+    <a target="_blank" href="https://pd.qq.com/s/bwxia254o">
+      <img alt="QQ Channel" src="https://img.shields.io/badge/QQ-blue?logo=tencentqq">
+    </a>
+</div>
+
+<div align="center">
+    <a target="_blank" href="https://huggingface.co/spaces/TTS-AGI/TTS-Arena-V2">
+      <img alt="TTS-Arena2 Score" src="https://img.shields.io/badge/TTS_Arena2-Rank_%231-gold?style=flat-square&logo=trophy&logoColor=white">
+    </a>
     <a target="_blank" href="https://huggingface.co/spaces/fishaudio/fish-speech-1">
         <img alt="Huggingface" src="https://img.shields.io/badge/🤗%20-space%20demo-yellow"/>
     </a>
-    <a target="_blank" href="https://pd.qq.com/s/bwxia254o">
-      <img alt="QQ Channel" src="https://img.shields.io/badge/QQ-blue?logo=tencentqq">
+    <a target="_blank" href="https://huggingface.co/fishaudio/openaudio-s1-mini">
+        <img alt="HuggingFace Model" src="https://img.shields.io/badge/🤗%20-models-orange"/>
     </a>
 </div>
 
-このコードベースはApache Licenseの下でリリースされ、すべてのモデルウェイトはCC-BY-NC-SA-4.0 Licenseの下でリリースされています。詳細については[LICENSE](../LICENSE)をご参照ください。
+> [!IMPORTANT]
+> **ライセンス注意事項**  
+> このコードベースは**Apache License**の下でリリースされ、すべてのモデルウェイトは**CC-BY-NC-SA-4.0 License**の下でリリースされています。詳細については[LICENSE](../LICENSE)をご参照ください。
+
+> [!WARNING]
+> **法的免責事項**  
+> 私たちはコードベースの不法な使用について一切の責任を負いません。DMCA及びその他の関連法律について、現地の法律をご参照ください。
+
+---
+
+## 🎉 発表
 
-私たちは名前をOpenAudioに変更したことをお知らせでき、嬉しく思います。これは全く新しいText-to-Speechモデルシリーズになります。
+**OpenAudio**へのリブランドを発表できることを嬉しく思います。Fish-Speechの基盤を元に構築された、革新的な新しい高度Text-to-Speechモデルシリーズを紹介します。
 
-デモは[Fish Audio Playground](https://fish.audio)で利用可能です。
+このシリーズの最初のモデルとして**OpenAudio-S1**をリリースできることを誇りに思います。品質、性能、機能において大幅な改善を実現しました
 
-ブログと技術レポートについては[OpenAudioウェブサイト](https://openaudio.com)をご覧ください。
+OpenAudio-S1には2つのバージョンがあります:**OpenAudio-S1**と**OpenAudio-S1-mini**。両モデルとも[Fish Audio Playground](https://fish.audio)(**OpenAudio-S1**用)と[Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini)(**OpenAudio-S1-mini**用)で利用可能です
 
-## 機能
-### OpenAudio-S1 (Fish-Speechの新バージョン)
+ブログと技術レポートについては[OpenAudioウェブサイト](https://openaudio.com/blogs/s1)をご覧ください。
 
-1. このモデルはfish-speechが持っていた**すべての機能**を持っています。
+## ハイライト ✨
 
-2. OpenAudio S1は音声合成を強化するための様々な感情、トーン、特別なマーカーをサポートしています:
-   
-      (angry) (sad) (disdainful) (excited) (surprised) (satisfied) (unhappy) (anxious) (hysterical) (delighted) (scared) (worried) (indifferent) (upset) (impatient) (nervous) (guilty) (scornful) (frustrated) (depressed) (panicked) (furious) (empathetic) (embarrassed) (reluctant) (disgusted) (keen) (moved) (proud) (relaxed) (grateful) (confident) (interested) (curious) (confused) (joyful) (disapproving) (negative) (denying) (astonished) (serious) (sarcastic) (conciliative) (comforting) (sincere) (sneering) (hesitating) (yielding) (painful) (awkward) (amused)
+### **優秀なTTS品質**
 
-   またトーンマーカーもサポートしています:
+Seed TTS Eval Metricsを使用してモデル性能を評価した結果、OpenAudio S1は英語テキストで**0.008 WER**と**0.004 CER**を達成し、これは従来のモデルより大幅に優れています。(英語、自動評価、OpenAI gpt-4o-transcribeベース、Revai/pyannote-wespeaker-voxceleb-resnet34-LMを使用した話者距離)
 
-   (急いだトーン) (叫び) (絶叫) (ささやき) (柔らかいトーン)
+| モデル | 単語誤り率 (WER) | 文字誤り率 (CER) | 話者距離 |
+|-------|------------------|------------------|----------|
+| **S1** | **0.008** | **0.004** | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
 
-    サポートされているいくつかの特別なマーカーがあります:
+### **TTS-Arena2でのベストモデル** 🏆
 
-    (笑い) (くすくす笑い) (すすり泣き) (大声で泣く) (ため息) (あえぎ) (うめき) (群衆の笑い) (背景の笑い) (観客の笑い)
+OpenAudio S1は、テキスト音声変換評価のベンチマークである[TTS-Arena2](https://arena.speechcolab.org/)で**1位**を獲得しました:
 
-    また、**ハ、ハ、ハ**を使って制御することもでき、あなた自身が探索を待っている他の多くのケースがあります。
+<div align="center">
+    <img src="assets/Elo.jpg" alt="TTS-Arena2 Ranking" style="width: 75%;" />
+</div>
+
+### **音声制御**
+OpenAudio S1は**音声合成を強化するための様々な感情、トーン、特別なマーカーをサポート**しています:
 
-3. OpenAudio S1には以下のサイズが含まれています:
--   **S1 (4B, プロプライエタリ):** フルサイズのモデル。
--   **S1-mini (0.5B, オープンソース):** S1の蒸留版。
+- **基本感情**:
+```
+(怒った) (悲しい) (興奮した) (驚いた) (満足した) (喜んだ) 
+(恐れた) (心配した) (動揺した) (緊張した) (イライラした) (憂鬱な)
+(共感的な) (恥ずかしい) (嫌悪した) (感動した) (誇らしい) (リラックスした)
+(感謝する) (自信のある) (興味のある) (好奇心のある) (混乱した) (喜びに満ちた)
+```
 
-    S1とS1-miniの両方がオンライン人間フィードバック強化学習(RLHF)を組み込んでいます。
+- **高度な感情**:
+```
+(軽蔑的な) (不幸な) (不安な) (ヒステリックな) (無関心な) 
+(せっかちな) (罪悪感のある) (軽蔑した) (パニックした) (激怒した) (しぶしぶの)
+(熱心な) (不賛成の) (否定的な) (否認する) (驚愕した) (真剣な)
+(皮肉な) (宥める) (慰める) (誠実な) (冷笑する)
+(躊躇する) (屈服する) (苦痛な) (気まずい) (面白がる)
+```
 
-4. 評価
+- **トーンマーカー**:
+```
+(急いだトーン) (叫ぶ) (悲鳴) (囁く) (柔らかいトーン)
+```
 
-    **Seed TTS評価メトリクス(英語、自動評価、OpenAI gpt-4o-transcribeベース、Revai/pyannote-wespeaker-voxceleb-resnet34-LMを使用したスピーカー距離):**
+- **特別な音響効果**:
+```
+(笑う) (くすくす笑う) (すすり泣く) (大声で泣く) (ため息) (息切れ)
+(うめく) (群衆の笑い声) (背景の笑い声) (聴衆の笑い声)
+```
 
-    -   **S1:**
-        -   WER(単語誤り率):**0.008**
-        -   CER(文字誤り率):**0.004**
-        -   距離:**0.332**
-    -   **S1-mini:**
-        -   WER(単語誤り率):**0.011**
-        -   CER(文字誤り率):**0.005**
-        -   距離:**0.380**
-    
+また、「ハ、ハ、ハ」を使って制御することもでき、あなた自身が探索できる多くの他のケースがあります。
 
-## 免責事項
+(現在、英語、中国語、日本語をサポートしており、より多くの言語が近日公開予定です!)
 
-コードベースの違法な使用について、いかなる責任も負いません。DMCAおよびその他の関連法律に関する現地の法律をご参照ください。
+### **2種類のモデル**
 
-## 動画
+| モデル | サイズ | 利用可能性 | 機能 |
+|-------|--------|------------|------|
+| **S1** | 4Bパラメータ | [fish.audio](fish.audio)で利用可能 | フル機能のフラッグシップモデル |
+| **S1-mini** | 0.5Bパラメータ | huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini)で利用可能 | コア機能を持つ蒸留版 |
 
-#### 続く予定。
+S1とS1-miniの両方がオンライン人間フィードバック強化学習(RLHF)を組み込んでいます
 
-## ドキュメント
+## **機能**
 
-- [環境構築](en/install.md)
-- [推論](en/inference.md)
+1. **ゼロショット・少数ショットTTS:** 10〜30秒の音声サンプルを入力して高品質のTTS出力を生成します。**詳細なガイドラインについては、[Voice Cloning Best Practices](https://docs.fish.audio/text-to-speech/voice-clone-best-practices)をご覧ください。**
+
+2. **多言語・言語横断サポート:** 多言語テキストを入力ボックスにコピー&ペーストするだけで、言語を気にする必要はありません。現在、英語、日本語、韓国語、中国語、フランス語、ドイツ語、アラビア語、スペイン語をサポートしています。
+
+3. **音素依存なし:** モデルは強い汎化能力を持ち、TTSに音素に依存しません。どの言語の文字体系のテキストも処理できます。
+
+4. **高精度:** Seed-TTS Evalで約0.4%の低いCER(文字誤り率)と約0.8%のWER(単語誤り率)を達成します。
+
+5. **高速:** fish-tech加速により、Nvidia RTX 4060ラップトップで約1:5、Nvidia RTX 4090で約1:15のリアルタイム係数を実現します。
+
+6. **WebUI推論:** Chrome、Firefox、Edge、その他のブラウザと互換性のある使いやすいGradioベースのWeb UIを提供します。
+
+7. **GUI推論:** APIサーバーとシームレスに動作するPyQt6グラフィカルインターフェースを提供します。Linux、Windows、macOSをサポートします。[GUIを見る](https://github.com/AnyaCoder/fish-speech-gui)。
+
+8. **デプロイフレンドリー:** Linux、Windows(macOSは近日公開予定)のネイティブサポートで推論サーバーを簡単にセットアップし、速度損失を最小限に抑えます。
+
+## **メディア・デモ**
+
+<div align="center">
+
+### **ソーシャルメディア**
+<a href="https://x.com/FishAudio/status/1929915992299450398" target="_blank">
+    <img src="https://img.shields.io/badge/𝕏-Latest_Demo-black?style=for-the-badge&logo=x&logoColor=white" alt="Latest Demo on X" />
+</a>
+
+### **インタラクティブデモ**
+<a href="https://fish.audio" target="_blank">
+    <img src="https://img.shields.io/badge/Fish_Audio-Try_OpenAudio_S1-blue?style=for-the-badge" alt="Try OpenAudio S1" />
+</a>
+<a href="https://huggingface.co/spaces/fishaudio/openaudio-s1-mini" target="_blank">
+    <img src="https://img.shields.io/badge/Hugging_Face-Try_S1_Mini-yellow?style=for-the-badge" alt="Try S1 Mini" />
+</a>
+
+### **ビデオショーケース**
+<iframe width="560" height="315" src="https://www.youtube.com/embed/SYuPvd7m06A" title="OpenAudio S1 Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+
+### **音声サンプル**
+<div style="margin: 20px 0;">
+    <em>高品質の音声サンプルは間もなく公開予定で、異なる言語と感情における私たちの多言語TTS機能を実演します。</em>
+</div>
+
+</div>
+
+---
+
+## ドキュメント
 
-現在のモデルは**ファインチューニングをサポートしていない**ことに注意してください。
+- [環境構築](ja/install.md)
+- [推論](ja/inference.md)
 
 ## クレジット
 
@@ -104,6 +186,7 @@
 - [MQTTS](https://github.com/b04901014/MQTTS)
 - [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
 - [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)
+- [Qwen3](https://github.com/QwenLM/Qwen3)
 
 ## 技術レポート (V1.4)
 ```bibtex

+ 123 - 40
docs/README.ko.md

@@ -26,75 +26,157 @@
     <a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
         <img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
     </a>
+    <a target="_blank" href="https://pd.qq.com/s/bwxia254o">
+      <img alt="QQ Channel" src="https://img.shields.io/badge/QQ-blue?logo=tencentqq">
+    </a>
+</div>
+
+<div align="center">
+    <a target="_blank" href="https://huggingface.co/spaces/TTS-AGI/TTS-Arena-V2">
+      <img alt="TTS-Arena2 Score" src="https://img.shields.io/badge/TTS_Arena2-Rank_%231-gold?style=flat-square&logo=trophy&logoColor=white">
+    </a>
     <a target="_blank" href="https://huggingface.co/spaces/fishaudio/fish-speech-1">
         <img alt="Huggingface" src="https://img.shields.io/badge/🤗%20-space%20demo-yellow"/>
     </a>
-    <a target="_blank" href="https://pd.qq.com/s/bwxia254o">
-      <img alt="QQ Channel" src="https://img.shields.io/badge/QQ-blue?logo=tencentqq">
+    <a target="_blank" href="https://huggingface.co/fishaudio/openaudio-s1-mini">
+        <img alt="HuggingFace Model" src="https://img.shields.io/badge/🤗%20-models-orange"/>
     </a>
 </div>
 
-이 코드베이스는 Apache License 하에 릴리스되며, 모든 모델 가중치는 CC-BY-NC-SA-4.0 License 하에 릴리스됩니다. 자세한 내용은 [LICENSE](../LICENSE)를 참조하세요.
+> [!IMPORTANT]
+> **라이선스 고지사항**  
+> 이 코드베이스는 **Apache License** 하에 릴리스되며, 모든 모델 가중치는 **CC-BY-NC-SA-4.0 License** 하에 릴리스됩니다. 자세한 내용은 [LICENSE](../LICENSE)를 참조하세요.
+
+> [!WARNING]
+> **법적 면책조항**  
+> 저희는 코드베이스의 불법적인 사용에 대해 어떠한 책임도 지지 않습니다. DMCA 및 기타 관련 법률에 대한 현지 법률을 참조하세요.
+
+---
+
+## 🎉 발표
 
-저희는 이름을 OpenAudio로 변경했다고 발표하게 되어 기쁩니다. 이는 완전히 새로운 Text-to-Speech 모델 시리즈가 될 것입니다.
+**OpenAudio**로의 리브랜딩을 발표하게 되어 기쁩니다. Fish-Speech의 기반 위에 구축된 혁신적인 새로운 고급 Text-to-Speech 모델 시리즈를 소개합니다.
 
-데모는 [Fish Audio Playground](https://fish.audio)에서 사용할 수 있습니다.
+이 시리즈의 첫 번째 모델인 **OpenAudio-S1**을 출시하게 되어 자랑스럽습니다. 품질, 성능, 기능에서 상당한 개선을 제공합니다.
 
-블로그와 기술 보고서는 [OpenAudio 웹사이트](https://openaudio.com)를 방문하세요.
+OpenAudio-S1은 두 가지 버전으로 제공됩니다: **OpenAudio-S1**과 **OpenAudio-S1-mini**. 두 모델 모두 [Fish Audio Playground](https://fish.audio)(**OpenAudio-S1**용)와 [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini)(**OpenAudio-S1-mini**용)에서 사용할 수 있습니다.
 
-## 기능
-### OpenAudio-S1 (Fish-Speech의 새 버전)
+블로그와 기술 보고서는 [OpenAudio 웹사이트](https://openaudio.com/blogs/s1)를 방문하세요.
 
-1. 이 모델은 fish-speech가 가지고 있던 **모든 기능**을 가지고 있습니다.
+## 주요 특징 ✨
 
-2. OpenAudio S1은 음성 합성을 향상시키기 위한 다양한 감정, 톤, 특별한 마커를 지원합니다:
-   
-      (angry) (sad) (disdainful) (excited) (surprised) (satisfied) (unhappy) (anxious) (hysterical) (delighted) (scared) (worried) (indifferent) (upset) (impatient) (nervous) (guilty) (scornful) (frustrated) (depressed) (panicked) (furious) (empathetic) (embarrassed) (reluctant) (disgusted) (keen) (moved) (proud) (relaxed) (grateful) (confident) (interested) (curious) (confused) (joyful) (disapproving) (negative) (denying) (astonished) (serious) (sarcastic) (conciliative) (comforting) (sincere) (sneering) (hesitating) (yielding) (painful) (awkward) (amused)
+### **뛰어난 TTS 품질**
 
-   또한 톤 마커도 지원합니다:
+우리는 Seed TTS Eval Metrics를 사용하여 모델 성능을 평가했으며, 결과에 따르면 OpenAudio S1은 영어 텍스트에서 **0.008 WER**과 **0.004 CER**을 달성하여 이전 모델들보다 상당히 우수한 성능을 보입니다. (영어, 자동 평가, OpenAI gpt-4o-transcribe 기반, Revai/pyannote-wespeaker-voxceleb-resnet34-LM을 사용한 화자 거리)
 
-   (급한 톤) (외치기) (비명지르기) (속삭이기) (부드러운 톤)
+| 모델 | 단어 오류율 (WER) | 문자 오류율 (CER) | 화자 거리 |
+|-------|----------------------|---------------------------|------------------|
+| **S1** | **0.008**  | **0.004**  | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
 
-    지원되는 몇 가지 특별한 마커가 있습니다:
+### **TTS-Arena2 최고 모델** 🏆
 
-    (웃음) (킥킥거림) (흐느낌) (큰 소리로 우는 것) (한숨) (헐떡거림) (신음) (군중 웃음) (배경 웃음) (관객 웃음)
+OpenAudio S1은 텍스트 음성 변환 평가의 벤치마크인 [TTS-Arena2](https://arena.speechcolab.org/)에서 **1위**를 달성했습니다:
 
-    또한 **하, 하, 하**를 사용하여 제어할 수도 있으며, 여러분이 직접 탐험할 수 있는 많은 다른 경우들이 있습니다.
+<div align="center">
+    <img src="assets/Elo.jpg" alt="TTS-Arena2 순위" style="width: 75%;" />
+</div>
+
+### **음성 제어**
+OpenAudio S1은 **음성 합성을 향상시키기 위한 다양한 감정, 톤, 특별한 마커를 지원**합니다:
 
-3. OpenAudio S1은 다음 크기를 포함합니다:
--   **S1 (4B, 독점):** 전체 크기 모델.
--   **S1-mini (0.5B, 오픈소스):** S1의 증류 버전.
+- **기본 감정**:
+```
+(화난) (슬픈) (흥분한) (놀란) (만족한) (기쁜) 
+(무서워하는) (걱정하는) (속상한) (긴장한) (좌절한) (우울한)
+(공감하는) (당황한) (역겨워하는) (감동한) (자랑스러운) (편안한)
+(감사하는) (자신있는) (관심있는) (호기심있는) (혼란스러운) (즐거운)
+```
 
-    S1과 S1-mini 모두 온라인 인간 피드백 강화학습(RLHF)을 통합하고 있습니다.
+- **고급 감정**:
+```
+(경멸하는) (불행한) (불안한) (히스테리한) (무관심한) 
+(조급한) (죄책감있는) (냉소적인) (공황상태인) (분노한) (마지못한)
+(열성적인) (반대하는) (부정적인) (부인하는) (놀란) (진지한)
+(비꼬는) (달래는) (위로하는) (진심인) (비웃는)
+(망설이는) (굴복하는) (고통스러운) (어색한) (재미있어하는)
+```
 
-4. 평가
+- **톤 마커**:
+```
+(급한 톤) (외치기) (비명지르기) (속삭이기) (부드러운 톤)
+```
 
-    **Seed TTS 평가 메트릭 (영어, 자동 평가, OpenAI gpt-4o-transcribe 기반, Revai/pyannote-wespeaker-voxceleb-resnet34-LM을 사용한 화자 거리):**
+- **특별한 오디오 효과**:
+```
+(웃음) (킥킥거림) (흐느낌) (큰 소리로 우는 것) (한숨) (헐떡거림)
+(신음) (군중 웃음) (배경 웃음) (관객 웃음)
+```
 
-    -   **S1:**
-        -   WER (단어 오류율): **0.008**
-        -   CER (문자 오류율): **0.004**
-        -   거리: **0.332**
-    -   **S1-mini:**
-        -   WER (단어 오류율): **0.011**
-        -   CER (문자 오류율): **0.005**
-        -   거리: **0.380**
-    
+또한 **하, 하, 하**를 사용하여 제어할 수도 있으며, 여러분이 직접 탐험할 수 있는 많은 다른 경우들이 있습니다.
 
-## 면책 조항
+(현재 영어, 중국어, 일본어를 지원하며, 더 많은 언어가 곧 추가될 예정입니다!)
 
-저희는 코드베이스의 불법적인 사용에 대해 어떠한 책임도 지지 않습니다. DMCA 및 기타 관련 법률에 대한 현지 법률을 참조하세요.
+### **두 가지 유형의 모델**
 
-## 비디오
+| 모델 | 크기 | 가용성 | 특징 |
+|-------|------|--------------|----------|
+| **S1** | 4B 매개변수 | [fish.audio](https://fish.audio)에서 사용 가능 | 모든 기능을 갖춘 플래그십 모델 |
+| **S1-mini** | 0.5B 매개변수 | 허깅페이스 [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini)에서 사용 가능 | 핵심 기능을 갖춘 증류 버전 |
 
-#### 계속될 예정입니다.
+S1과 S1-mini 모두 온라인 인간 피드백 강화학습(RLHF)을 통합하고 있습니다.
 
-## 문서
+## **기능**
 
-- [환경 구축](en/install.md)
-- [추론](en/inference.md)
+1. **제로샷 및 퓨샷 TTS:** 10~30초의 음성 샘플을 입력하여 고품질 TTS 출력을 생성합니다. **자세한 가이드라인은 [음성 복제 모범 사례](https://docs.fish.audio/text-to-speech/voice-clone-best-practices)를 참조하세요.**
+
+2. **다국어 및 교차 언어 지원:** 다국어 텍스트를 입력 상자에 복사하여 붙여넣기만 하면 됩니다. 언어를 걱정할 필요가 없습니다. 현재 영어, 일본어, 한국어, 중국어, 프랑스어, 독일어, 아랍어, 스페인어를 지원합니다.
+
+3. **음소 의존성 없음:** 모델은 강력한 일반화 능력을 가지고 있으며 TTS를 위해 음소에 의존하지 않습니다. 모든 언어 스크립트의 텍스트를 처리할 수 있습니다.
+
+4. **높은 정확도:** Seed-TTS Eval에서 약 0.4%의 낮은 CER(문자 오류율)과 약 0.8%의 WER(단어 오류율)을 달성합니다.
+
+5. **빠른 속도:** fish-tech 가속화로 Nvidia RTX 4060 노트북에서 실시간 팩터가 약 1:5, Nvidia RTX 4090에서 1:15입니다.
+
+6. **WebUI 추론:** Chrome, Firefox, Edge 및 기타 브라우저와 호환되는 사용하기 쉬운 Gradio 기반 웹 UI를 제공합니다.
+
+7. **GUI 추론:** API 서버와 완벽하게 작동하는 PyQt6 그래픽 인터페이스를 제공합니다. Linux, Windows, macOS를 지원합니다. [GUI 보기](https://github.com/AnyaCoder/fish-speech-gui).
+
+8. **배포 친화적:** Linux, Windows(MacOS 곧 출시 예정)에 대한 네이티브 지원으로 추론 서버를 쉽게 설정할 수 있으며, 속도 손실을 최소화합니다.
+
+## **미디어 및 데모**
+
+<div align="center">
+
+### **소셜 미디어**
+<a href="https://x.com/FishAudio/status/1929915992299450398" target="_blank">
+    <img src="https://img.shields.io/badge/𝕏-Latest_Demo-black?style=for-the-badge&logo=x&logoColor=white" alt="X에서 최신 데모" />
+</a>
+
+### **인터랙티브 데모**
+<a href="https://fish.audio" target="_blank">
+    <img src="https://img.shields.io/badge/Fish_Audio-Try_OpenAudio_S1-blue?style=for-the-badge" alt="OpenAudio S1 체험하기" />
+</a>
+<a href="https://huggingface.co/spaces/fishaudio/openaudio-s1-mini" target="_blank">
+    <img src="https://img.shields.io/badge/Hugging_Face-Try_S1_Mini-yellow?style=for-the-badge" alt="S1 Mini 체험하기" />
+</a>
+
+### **비디오 쇼케이스**
+<iframe width="560" height="315" src="https://www.youtube.com/embed/SYuPvd7m06A" title="OpenAudio S1 Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+
+### **오디오 샘플**
+<div style="margin: 20px 0;">
+    <em> 다양한 언어와 감정에 걸친 다국어 TTS 기능을 보여주는 고품질 오디오 샘플이 곧 제공될 예정입니다.</em>
+</div>
+
+</div>
+
+---
+
+## 문서
 
-현재 모델은 **파인튜닝을 지원하지 않는다**는 점에 유의해야 합니다.
+- [환경 구축](ko/install.md)
+- [추론](ko/inference.md)
 
 ## 크레딧
 
@@ -104,6 +186,7 @@
 - [MQTTS](https://github.com/b04901014/MQTTS)
 - [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
 - [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)
+- [Qwen3](https://github.com/QwenLM/Qwen3)
 
 ## 기술 보고서 (V1.4)
 ```bibtex

+ 122 - 39
docs/README.pt-BR.md

@@ -26,75 +26,157 @@
     <a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
         <img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
     </a>
+    <a target="_blank" href="https://pd.qq.com/s/bwxia254o">
+      <img alt="QQ Channel" src="https://img.shields.io/badge/QQ-blue?logo=tencentqq">
+    </a>
+</div>
+
+<div align="center">
+    <a target="_blank" href="https://huggingface.co/spaces/TTS-AGI/TTS-Arena-V2">
+      <img alt="TTS-Arena2 Score" src="https://img.shields.io/badge/TTS_Arena2-Rank_%231-gold?style=flat-square&logo=trophy&logoColor=white">
+    </a>
     <a target="_blank" href="https://huggingface.co/spaces/fishaudio/fish-speech-1">
         <img alt="Huggingface" src="https://img.shields.io/badge/🤗%20-space%20demo-yellow"/>
     </a>
-    <a target="_blank" href="https://pd.qq.com/s/bwxia254o">
-      <img alt="QQ Channel" src="https://img.shields.io/badge/QQ-blue?logo=tencentqq">
+    <a target="_blank" href="https://huggingface.co/fishaudio/openaudio-s1-mini">
+        <img alt="HuggingFace Model" src="https://img.shields.io/badge/🤗%20-models-orange"/>
     </a>
 </div>
 
-Esta base de código é lançada sob a Licença Apache e todos os pesos dos modelos são lançados sob a Licença CC-BY-NC-SA-4.0. Consulte [LICENSE](../LICENSE) para mais detalhes.
+> [!IMPORTANT]
+> **Aviso de Licença**  
+> Esta base de código é lançada sob a **Licença Apache** e todos os pesos dos modelos são lançados sob a **Licença CC-BY-NC-SA-4.0**. Consulte [LICENSE](../LICENSE) para mais detalhes.
+
+> [!WARNING]
+> **Isenção de Responsabilidade Legal**  
+> Não assumimos qualquer responsabilidade pelo uso ilegal da base de código. Consulte as leis locais sobre DMCA e outras leis relacionadas.
+
+---
+
+## 🎉 Anúncio
+
+Estamos animados em anunciar que mudamos nossa marca para **OpenAudio** — introduzindo uma nova série revolucionária de modelos avançados de Text-to-Speech que se baseia na fundação do Fish-Speech.
+
+Temos o orgulho de lançar o **OpenAudio-S1** como o primeiro modelo desta série, oferecendo melhorias significativas em qualidade, desempenho e capacidades.
+
+O OpenAudio-S1 vem em duas versões: **OpenAudio-S1** e **OpenAudio-S1-mini**. Ambos os modelos estão agora disponíveis no [Fish Audio Playground](https://fish.audio) (para **OpenAudio-S1**) e [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini) (para **OpenAudio-S1-mini**).
+
+Visite o [site OpenAudio](https://openaudio.com/blogs/s1) para blog e relatório técnico.
+
+## Destaques ✨
+
+### **Excelente qualidade TTS**
+
+Usamos as métricas de avaliação Seed TTS para avaliar o desempenho do modelo, e os resultados mostram que o OpenAudio S1 alcança **0.008 WER** e **0.004 CER** em texto em inglês, que é significativamente melhor que modelos anteriores. (Inglês, avaliação automática, baseada no OpenAI gpt-4o-transcribe, distância do locutor usando Revai/pyannote-wespeaker-voxceleb-resnet34-LM)
+
+| Modelo | Taxa de Erro de Palavra (WER) | Taxa de Erro de Caractere (CER) | Distância do Locutor |
+|-------|----------------------|---------------------------|------------------|
+| **S1** | **0.008**  | **0.004**  | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
+
+### **Melhor Modelo no TTS-Arena2** 🏆
+
+O OpenAudio S1 alcançou a **classificação #1** no [TTS-Arena2](https://arena.speechcolab.org/), o benchmark para avaliação de text-to-speech:
+
+<div align="center">
+    <img src="assets/Elo.jpg" alt="Classificação TTS-Arena2" style="width: 75%;" />
+</div>
+
+### **Controle de Fala**
+O OpenAudio S1 **suporta uma variedade de marcadores emocionais, de tom e especiais** para aprimorar a síntese de fala:
+
+- **Emoções básicas**:
+```
+(raivoso) (triste) (animado) (surpreso) (satisfeito) (encantado) 
+(assustado) (preocupado) (chateado) (nervoso) (frustrado) (deprimido)
+(empático) (envergonhado) (enojado) (emocionado) (orgulhoso) (relaxado)
+(grato) (confiante) (interessado) (curioso) (confuso) (alegre)
+```
 
-Estamos animados em anunciar que mudamos nosso nome para OpenAudio, esta será uma nova série de modelos Text-to-Speech.
+- **Emoções avançadas**:
+```
+(desdenhoso) (infeliz) (ansioso) (histérico) (indiferente) 
+(impaciente) (culpado) (desprezível) (em pânico) (furioso) (relutante)
+(entusiasmado) (desaprovador) (negativo) (negando) (espantado) (sério)
+(sarcástico) (conciliador) (consolador) (sincero) (escarnecedor)
+(hesitante) (cedendo) (doloroso) (constrangido) (divertido)
+```
 
-Demo disponível em [Fish Audio Playground](https://fish.audio).
+- **Marcadores de tom**:
+```
+(tom apressado) (gritando) (gritando alto) (sussurrando) (tom suave)
+```
 
-Visite o [site OpenAudio](https://openaudio.com) para blog e relatório técnico.
+- **Efeitos de áudio especiais**:
+```
+(rindo) (dando risinhos) (soluçando) (chorando alto) (suspirando) (ofegando)
+(gemendo) (risos da multidão) (risos de fundo) (risos da audiência)
+```
 
-## Recursos
-### OpenAudio-S1 (Nova versão do Fish-Speech)
+Você também pode usar Ha,ha,ha para controlar, há muitos outros casos esperando para serem explorados por você mesmo.
 
-1. Este modelo possui **TODOS OS RECURSOS** que o fish-speech tinha.
+(Suporte para inglês, chinês e japonês agora, e mais idiomas em breve!)
 
-2. O OpenAudio S1 suporta uma variedade de marcadores emocionais, de tom e especiais para aprimorar a síntese de fala:
+### **Dois Tipos de Modelos**
+
+| Modelo | Tamanho | Disponibilidade | Recursos |
+|-------|------|--------------|----------|
+| **S1** | 4B parâmetros | Disponível em [fish.audio](https://fish.audio) | Modelo flagship com recursos completos |
+| **S1-mini** | 0.5B parâmetros | Disponível no Hugging Face [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) | Versão destilada com capacidades principais |
+
+Tanto S1 quanto S1-mini incorporam Aprendizado por Reforço online com Feedback Humano (RLHF).
    
-      (angry) (sad) (disdainful) (excited) (surprised) (satisfied) (unhappy) (anxious) (hysterical) (delighted) (scared) (worried) (indifferent) (upset) (impatient) (nervous) (guilty) (scornful) (frustrated) (depressed) (panicked) (furious) (empathetic) (embarrassed) (reluctant) (disgusted) (keen) (moved) (proud) (relaxed) (grateful) (confident) (interested) (curious) (confused) (joyful) (disapproving) (negative) (denying) (astonished) (serious) (sarcastic) (conciliative) (comforting) (sincere) (sneering) (hesitating) (yielding) (painful) (awkward) (amused)
+   ## **Recursos**
 
-   Também suporta marcadores de tom:
+1. **TTS Zero-shot e Few-shot:** Insira uma amostra vocal de 10 a 30 segundos para gerar saída TTS de alta qualidade. **Para diretrizes detalhadas, veja [Melhores Práticas de Clonagem de Voz](https://docs.fish.audio/text-to-speech/voice-clone-best-practices).**
 
-   (tom apressado) (gritando) (berrando) (sussurrando) (tom suave)
+2. **Suporte Multilíngue e Cross-lingual:** Simplesmente copie e cole texto multilíngue na caixa de entrada—não precisa se preocupar com o idioma. Atualmente suporta inglês, japonês, coreano, chinês, francês, alemão, árabe e espanhol.
 
-    Há alguns marcadores especiais que são suportados:
+3. **Sem Dependência de Fonema:** O modelo tem fortes capacidades de generalização e não depende de fonemas para TTS. Pode lidar com texto em qualquer script de idioma.
 
-    (rindo) (dando risadinhas) (soluçando) (chorando alto) (suspirando) (ofegando) (gemendo) (multidão rindo) (riso de fundo) (audiência rindo)
+4. **Altamente Preciso:** Alcança um baixo CER (Taxa de Erro de Caractere) de cerca de 0.4% e WER (Taxa de Erro de Palavra) de cerca de 0.8% para Seed-TTS Eval.
 
-    Você também pode usar **Ha,ha,ha** para controlar, há muitos outros casos esperando para serem explorados por você mesmo.
+5. **Rápido:** Com aceleração fish-tech, o fator de tempo real é aproximadamente 1:5 em um laptop Nvidia RTX 4060 e 1:15 em um Nvidia RTX 4090.
 
-3. O OpenAudio S1 inclui os seguintes tamanhos:
--   **S1 (4B, proprietário):** O modelo de tamanho completo.
--   **S1-mini (0.5B, código aberto):** Uma versão destilada do S1.
+6. **Inferência WebUI:** Apresenta uma UI web baseada em Gradio fácil de usar, compatível com Chrome, Firefox, Edge e outros navegadores.
 
-    Tanto S1 quanto S1-mini incorporam Aprendizado por Reforço online a partir de Feedback Humano (RLHF).
+7. **Inferência GUI:** Oferece uma interface gráfica PyQt6 que funciona perfeitamente com o servidor de API. Suporta Linux, Windows e macOS. [Ver GUI](https://github.com/AnyaCoder/fish-speech-gui).
 
-4. Avaliações
+8. **Amigável para Deploy:** Configure facilmente um servidor de inferência com suporte nativo para Linux, Windows (MacOS em breve), minimizando perda de velocidade.
 
-    **Métricas de Avaliação Seed TTS (Inglês, avaliação automática, baseada no OpenAI gpt-4o-transcribe, distância do locutor usando Revai/pyannote-wespeaker-voxceleb-resnet34-LM):**
+## **Mídia e Demos**
 
-    -   **S1:**
-        -   WER (Taxa de Erro de Palavra): **0.008**
-        -   CER (Taxa de Erro de Caractere): **0.004**
-        -   Distância: **0.332**
-    -   **S1-mini:**
-        -   WER (Taxa de Erro de Palavra): **0.011**
-        -   CER (Taxa de Erro de Caractere): **0.005**
-        -   Distância: **0.380**
-    
+<div align="center">
 
-## Aviso Legal
+### **Mídia Social**
+<a href="https://x.com/FishAudio/status/1929915992299450398" target="_blank">
+    <img src="https://img.shields.io/badge/𝕏-Latest_Demo-black?style=for-the-badge&logo=x&logoColor=white" alt="Demo Mais Recente no X" />
+</a>
 
-Não assumimos qualquer responsabilidade por qualquer uso ilegal da base de código. Consulte suas leis locais sobre DMCA e outras leis relacionadas.
+### **Demos Interativos**
+<a href="https://fish.audio" target="_blank">
+    <img src="https://img.shields.io/badge/Fish_Audio-Try_OpenAudio_S1-blue?style=for-the-badge" alt="Experimente OpenAudio S1" />
+</a>
+<a href="https://huggingface.co/spaces/fishaudio/openaudio-s1-mini" target="_blank">
+    <img src="https://img.shields.io/badge/Hugging_Face-Try_S1_Mini-yellow?style=for-the-badge" alt="Experimente S1 Mini" />
+</a>
 
-## Vídeos
+### **Vitrines de Vídeo**
+<iframe width="560" height="315" src="https://www.youtube.com/embed/SYuPvd7m06A" title="OpenAudio S1 Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
 
-#### A ser continuado.
+### **Amostras de Áudio**
+<div style="margin: 20px 0;">
+    <em> Amostras de áudio de alta qualidade estarão disponíveis em breve, demonstrando nossas capacidades TTS multilíngues em diferentes idiomas e emoções.</em>
+</div>
 
-## Documentos
+</div>
 
-- [Construir Ambiente](en/install.md)
-- [Inferência](en/inference.md)
+---
+
+## Documentos
 
-Deve-se notar que o modelo atual **NÃO SUPORTA AJUSTE FINO**.
+- [Construir Ambiente](pt/install.md)
+- [Inferência](pt/inference.md)
 
 ## Créditos
 
@@ -104,6 +186,7 @@ Deve-se notar que o modelo atual **NÃO SUPORTA AJUSTE FINO**.
 - [MQTTS](https://github.com/b04901014/MQTTS)
 - [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
 - [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)
+- [Qwen3](https://github.com/QwenLM/Qwen3)
 
 ## Relatório Técnico (V1.4)
 ```bibtex

+ 120 - 37
docs/README.zh.md

@@ -26,76 +26,158 @@
     <a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
         <img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
     </a>
+    <a target="_blank" href="https://pd.qq.com/s/bwxia254o">
+      <img alt="QQ Channel" src="https://img.shields.io/badge/QQ-blue?logo=tencentqq">
+    </a>
+</div>
+
+<div align="center">
+    <a target="_blank" href="https://huggingface.co/spaces/TTS-AGI/TTS-Arena-V2">
+      <img alt="TTS-Arena2 Score" src="https://img.shields.io/badge/TTS_Arena2-Rank_%231-gold?style=flat-square&logo=trophy&logoColor=white">
+    </a>
     <a target="_blank" href="https://huggingface.co/spaces/fishaudio/fish-speech-1">
         <img alt="Huggingface" src="https://img.shields.io/badge/🤗%20-space%20demo-yellow"/>
     </a>
-    <a target="_blank" href="https://pd.qq.com/s/bwxia254o">
-      <img alt="QQ Channel" src="https://img.shields.io/badge/QQ-blue?logo=tencentqq">
+    <a target="_blank" href="https://huggingface.co/fishaudio/openaudio-s1-mini">
+        <img alt="HuggingFace Model" src="https://img.shields.io/badge/🤗%20-models-orange"/>
     </a>
 </div>
 
-此代码库在 Apache License 下发布,所有模型权重在 CC-BY-NC-SA-4.0 License 下发布。更多详情请参考 [LICENSE](../LICENSE)。
+> [!IMPORTANT]
+> **许可证声明**  
+> 此代码库在 **Apache License** 下发布,所有模型权重在 **CC-BY-NC-SA-4.0 License** 下发布。更多详情请参考 [LICENSE](../LICENSE)。
+
+> [!WARNING]
+> **法律免责声明**  
+> 我们不对代码库的任何非法使用承担责任。请参考您当地关于 DMCA 和其他相关法律的法规。
+
+---
+
+## 🎉 公告
+
+我们很高兴地宣布,我们已将品牌重塑为 **OpenAudio** —— 推出基于 Fish-Speech 基础构建的革命性新一代高级文本转语音模型系列。
+
+我们自豪地发布 **OpenAudio-S1** 作为该系列的第一个模型,在质量、性能和功能方面都有显著改进。
+
+OpenAudio-S1 提供两个版本:**OpenAudio-S1** 和 **OpenAudio-S1-mini**。两个模型现在都可以在 [Fish Audio Playground](https://fish.audio)(**OpenAudio-S1**)和 [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini)(**OpenAudio-S1-mini**)上使用。
+
+请访问 [OpenAudio 网站](https://openaudio.com/blogs/s1) 获取博客和技术报告。
+
+## 亮点 ✨
+
+### **出色的 TTS 质量**
+
+我们使用 Seed TTS 评估指标来评估模型性能,结果显示 OpenAudio S1 在英语文本上达到了 **0.008 WER** 和 **0.004 CER**,这比以前的模型显著更好。(英语,自动评估,基于 OpenAI gpt-4o-transcribe,使用 Revai/pyannote-wespeaker-voxceleb-resnet34-LM 进行说话人距离计算)
+
+| 模型 | 词错误率 (WER) | 字符错误率 (CER) | 说话人距离 |
+|-------|----------------------|---------------------------|------------------|
+| **S1** | **0.008**  | **0.004**  | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
+
+### **TTS-Arena2 最佳模型** 🏆
+
+OpenAudio S1 在 [TTS-Arena2](https://arena.speechcolab.org/) 上取得了 **第一名**,这是文本转语音评估的基准:
+
+<div align="center">
+    <img src="assets/Elo.jpg" alt="TTS-Arena2 排名" style="width: 75%;" />
+</div>
+
+### **语音控制**
+OpenAudio S1 **支持多种情感、语调和特殊标记** 来增强语音合成:
+
+- **基础情感**:
+```
+(生气) (伤心) (兴奋) (惊讶) (满意) (高兴) 
+(害怕) (担心) (沮丧) (紧张) (挫败) (郁闷)
+(同情) (尴尬) (厌恶) (感动) (自豪) (放松)
+(感激) (自信) (感兴趣) (好奇) (困惑) (快乐)
+```
 
-我们很高兴地宣布,我们已将名字更改为 OpenAudio,这将是一个全新的文本转语音模型系列。
+- **高级情感**:
+```
+(鄙视) (不开心) (焦虑) (歇斯底里) (冷漠) 
+(不耐烦) (内疚) (轻蔑) (恐慌) (愤怒) (不情愿)
+(热衷) (不赞成) (消极) (否认) (震惊) (严肃)
+(讽刺) (安抚) (安慰) (真诚) (冷笑)
+(犹豫) (屈服) (痛苦) (尴尬) (觉得有趣)
+```
 
-演示可在 [Fish Audio Playground](https://fish.audio) 获得。
+- **语调标记**:
+```
+(急促的语调) (喊叫) (尖叫) (耳语) (柔和的语调)
+```
 
-访问 [OpenAudio 网站](https://openaudio.com) 获取博客和技术报告。
+- **特殊音频效果**:
+```
+(笑声) (轻笑) (抽泣) (大声哭泣) (叹息) (喘息)
+(呻吟) (人群笑声) (背景笑声) (观众笑声)
+```
 
-## 特性
-### OpenAudio-S1 (Fish-Speech 的新版本)
+您也可以使用 哈,哈,哈 来控制,还有许多其他情况等待您自己探索。
 
-1. 此模型具有 fish-speech 的**所有功能**。
+(目前支持英语、中文和日语,更多语言即将推出!)
 
-2. OpenAudio S1 支持多种情感、语调和特殊标记来增强语音合成:
+### **两种类型的模型**
+
+| 模型 | 大小 | 可用性 | 特性 |
+|-------|------|--------------|----------|
+| **S1** | 4B 参数 | 在 [fish.audio](https://fish.audio) 上可用 | 功能齐全的旗舰模型 |
+| **S1-mini** | 0.5B 参数 | 在 Hugging Face [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) 上可用 | 具有核心功能的精简版本 |
+
+S1 和 S1-mini 都集成了在线人类反馈强化学习(RLHF)。
    
-      (angry) (sad) (disdainful) (excited) (surprised) (satisfied) (unhappy) (anxious) (hysterical) (delighted) (scared) (worried) (indifferent) (upset) (impatient) (nervous) (guilty) (scornful) (frustrated) (depressed) (panicked) (furious) (empathetic) (embarrassed) (reluctant) (disgusted) (keen) (moved) (proud) (relaxed) (grateful) (confident) (interested) (curious) (confused) (joyful) (disapproving) (negative) (denying) (astonished) (serious) (sarcastic) (conciliative) (comforting) (sincere) (sneering) (hesitating) (yielding) (painful) (awkward) (amused) PS:中文也支持
+   ## **功能**
 
-   同时支持语调标记:
+1. **零样本和少样本 TTS:** 输入 10 到 30 秒的语音样本以生成高质量的 TTS 输出。**详细指南请参见 [语音克隆最佳实践](https://docs.fish.audio/text-to-speech/voice-clone-best-practices)。**
 
-   (急促的语调) (大喊) (尖叫) (低语) (温柔的语调)
+2. **多语言和跨语言支持:** 只需将多语言文本复制并粘贴到输入框中——无需担心语言问题。目前支持英语、日语、韩语、中文、法语、德语、阿拉伯语和西班牙语。
 
-    还有一些特殊标记得到支持:
+3. **无音素依赖:** 模型具有强大的泛化能力,不依赖音素进行 TTS。它可以处理任何语言脚本的文本。
 
-    (笑声) (轻笑) (抽泣) (大声哭泣) (叹气) (喘气) (呻吟) (人群笑声) (背景笑声) (观众笑声)
+4. **高准确性:** 在 Seed-TTS Eval 上实现约 0.4% 的低 CER(字符错误率)和约 0.8% 的 WER(词错误率)。
 
-    您也可以使用 **哈,哈,哈** 来控制,还有许多其他情况等待您自己探索。
+5. **快速:** 通过 fish-tech 加速,在 Nvidia RTX 4060 笔记本电脑上实时因子约为 1:5,在 Nvidia RTX 4090 上为 1:15
 
-3. OpenAudio S1 包含以下规模:
--   **S1 (4B, 专有):** 完整规模的模型。
--   **S1-mini (0.5B, 开源):** S1 的蒸馏版本。
+6. **WebUI 推理:** 具有易于使用的基于 Gradio 的 Web UI,兼容 Chrome、Firefox、Edge 和其他浏览器。
 
-    S1 和 S1-mini 都结合了在线人类反馈强化学习(RLHF)。
+7. **GUI 推理:** 提供与 API 服务器无缝配合的 PyQt6 图形界面。支持 Linux、Windows 和 macOS。[查看 GUI](https://github.com/AnyaCoder/fish-speech-gui)。
 
-4. 评估
+8. **部署友好:** 通过对 Linux、Windows(macOS 即将推出)的原生支持,轻松设置推理服务器,最小化速度损失。
 
-    **Seed TTS 评估指标(英语,自动评估,基于 OpenAI gpt-4o-transcribe,使用 Revai/pyannote-wespeaker-voxceleb-resnet34-LM 的说话人距离):**
+## **媒体和演示**
 
-    -   **S1:**
-        -   WER(词错误率):**0.008**
-        -   CER(字符错误率):**0.004**
-        -   距离:**0.332**
-    -   **S1-mini:**
-        -   WER(词错误率):**0.011**
-        -   CER(字符错误率):**0.005**
-        -   距离:**0.380**
-    
+<div align="center">
 
-## 免责声明
+### **社交媒体**
+<a href="https://x.com/FishAudio/status/1929915992299450398" target="_blank">
+    <img src="https://img.shields.io/badge/𝕏-Latest_Demo-black?style=for-the-badge&logo=x&logoColor=white" alt="X 上的最新演示" />
+</a>
 
-我们不对代码库的任何非法使用承担责任。请参考您当地关于 DMCA 和其他相关法律的规定。
+### **交互式演示**
+<a href="https://fish.audio" target="_blank">
+    <img src="https://img.shields.io/badge/Fish_Audio-Try_OpenAudio_S1-blue?style=for-the-badge" alt="试用 OpenAudio S1" />
+</a>
+<a href="https://huggingface.co/spaces/fishaudio/openaudio-s1-mini" target="_blank">
+    <img src="https://img.shields.io/badge/Hugging_Face-Try_S1_Mini-yellow?style=for-the-badge" alt="试用 S1 Mini" />
+</a>
 
-## 视频
+### **视频展示**
+<iframe width="560" height="315" src="https://www.youtube.com/embed/SYuPvd7m06A" title="OpenAudio S1 Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
 
-#### 待续。
+### **音频样本**
+<div style="margin: 20px 0;">
+    <em> 展示我们跨不同语言和情感的多语言 TTS 功能的高质量音频样本即将推出。</em>
+</div>
+
+</div>
+
+---
 
 ## 文档
 
 - [构建环境](zh/install.md)
 - [推理](zh/inference.md)
 
-需要注意的是,当前模型**不支持微调**。
-
 ## 致谢
 
 - [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
@@ -104,6 +186,7 @@
 - [MQTTS](https://github.com/b04901014/MQTTS)
 - [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
 - [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)
+- [Qwen3](https://github.com/QwenLM/Qwen3)
 
 ## 技术报告 (V1.4)
 ```bibtex

binární
docs/assets/Elo.jpg


binární
docs/assets/Thumbnail.jpg


+ 50 - 24
docs/en/index.md

@@ -28,22 +28,40 @@
 
 ---
 
-!!! warning "Legal Notice"
-    We assume no responsibility for any illegal use of the codebase. Please refer to the local laws regarding DMCA (Digital Millennium Copyright Act) and other relevant laws in your area.
-    
-    **License:** This codebase is released under Apache 2.0 license and all models are released under the CC-BY-NC-SA-4.0 license.
+!!! note "License Notice"
+    This codebase is released under **Apache License** and all model weights are released under **CC-BY-NC-SA-4.0 License**. Please refer to [LICENSE](LICENSE) for more details.
+
+!!! warning "Legal Disclaimer"
+    We do not hold any responsibility for any illegal usage of the codebase. Please refer to your local laws about DMCA and other related laws.
 
 ## **Introduction**
 
 We are excited to announce that we have rebranded to **OpenAudio** - introducing a brand new series of advanced Text-to-Speech models that builds upon the foundation of Fish-Speech with significant improvements and new capabilities.
 
-**Openaudio-S1-mini**: [Video](To Be Uploaded); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
+**Openaudio-S1-mini**: [Blog](https://openaudio.com/blogs/s1); [Video](https://www.youtube.com/watch?v=SYuPvd7m06A); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
 
 **Fish-Speech v1.5**: [Video](https://www.bilibili.com/video/BV1EKiDYBE4o/); [Hugging Face](https://huggingface.co/fishaudio/fish-speech-1.5);
 
-## **Highlights** ✨
+## **Highlights**
+
+### **Excellent TTS quality**
+
+We use Seed TTS Eval Metrics to evaluate the model performance, and the results show that OpenAudio S1 achieves **0.008 WER** and **0.004 CER** on English text, which is significantly better than previous models. (English, auto eval, based on OpenAI gpt-4o-transcribe, speaker distance using Revai/pyannote-wespeaker-voxceleb-resnet34-LM)
+
+| Model | Word Error Rate (WER) | Character Error Rate (CER) | Speaker Distance |
+|:-----:|:--------------------:|:-------------------------:|:----------------:|
+| **S1** | **0.008** | **0.004** | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
 
-### **Emotion Control**
+### **Best Model in TTS-Arena2**
+
+OpenAudio S1 has achieved the **#1 ranking** on [TTS-Arena2](https://arena.speechcolab.org/), the benchmark for text-to-speech evaluation:
+
+<div align="center">
+    <img src="assets/Elo.jpg" alt="TTS-Arena2 Ranking" style="width: 75%;" />
+</div>
+
+### **Speech Control**
 OpenAudio S1 **supports a variety of emotional, tone, and special markers** to enhance speech synthesis:
 
 - **Basic emotions**:
@@ -63,6 +81,8 @@ OpenAudio S1 **supports a variety of emotional, tone, and special markers** to e
 (hesitating) (yielding) (painful) (awkward) (amused)
 ```
 
+(Support for English, Chinese and Japanese now, and more languages is coming soon!)
+
 - **Tone markers**:
 ```
 (in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
@@ -76,21 +96,13 @@ OpenAudio S1 **supports a variety of emotional, tone, and special markers** to e
 
 You can also use Ha,ha,ha to control, there's many other cases waiting to be explored by yourself.
 
-### **Excellent TTS quality**
-
-We use Seed TTS Eval Metrics to evaluate the model performance, and the results show that OpenAudio S1 achieves **0.008 WER** and **0.004 CER** on English text, which is significantly better than previous models. (English, auto eval, based on OpenAI gpt-4o-transcribe, speaker distance using Revai/pyannote-wespeaker-voxceleb-resnet34-LM)
+### **Two Type of Models**
 
-| Model | Word Error Rate (WER) | Character Error Rate (CER) | Speaker Distance |
-|-------|----------------------|---------------------------|------------------|
-| **S1** | **0.008**  | **0.004**  | **0.332** |
-| **S1-mini** | **0.011** | **0.005** | **0.380** |
+We offer two model variants to suit different needs:
 
-### **Two Type of Models**
+- **OpenAudio S1 (4B parameters)**: Our full-featured flagship model available on [fish.audio](https://fish.audio), delivering the highest quality speech synthesis with all advanced features.
 
-| Model | Size | Availability | Features |
-|-------|------|--------------|----------|
-| **S1** | 4B parameters | Avaliable on [fish.audio](fish.audio) | Full-featured flagship model |
-| **S1-mini** | 0.5B parameters | Avaliable on huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) | Distilled version with core capabilities |
+- **OpenAudio S1-mini (0.5B parameters)**: A distilled version with core capabilities, available on [Hugging Face Space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini), optimized for faster inference while maintaining excellent quality.
 
 Both S1 and S1-mini incorporate online Reinforcement Learning from Human Feedback (RLHF).
 
@@ -112,14 +124,28 @@ Both S1 and S1-mini incorporate online Reinforcement Learning from Human Feedbac
 
 8. **Deploy-Friendly:** Easily set up an inference server with native support for Linux, Windows (MacOS comming soon), minimizing speed loss.
 
-## **Disclaimer**
+## **Media & Demos**
 
-We do not hold any responsibility for any illegal usage of the codebase. Please refer to your local laws about DMCA and other related laws.
+<!-- <div align="center"> -->
 
-## **Media & Demos**
+<h3><strong>Social Media</strong></h3>
+<a href="https://x.com/FishAudio/status/1929915992299450398" target="_blank">
+    <img src="https://img.shields.io/badge/𝕏-Latest_Demo-black?style=for-the-badge&logo=x&logoColor=white" alt="Latest Demo on X" />
+</a>
+
+<h3><strong>Interactive Demos</strong></h3>
 
-#### 🚧 Coming Soon
-Video demonstrations and tutorials are currently in development.
+<a href="https://fish.audio" target="_blank">
+    <img src="https://img.shields.io/badge/Fish_Audio-Try_OpenAudio_S1-blue?style=for-the-badge" alt="Try OpenAudio S1" />
+</a>
+<a href="https://huggingface.co/spaces/fishaudio/openaudio-s1-mini" target="_blank">
+    <img src="https://img.shields.io/badge/Hugging_Face-Try_S1_Mini-yellow?style=for-the-badge" alt="Try S1 Mini" />
+</a>
+
+<h3><strong>Video Showcases</strong></h3>
+<div align="center">
+<iframe width="560" height="315" src="https://www.youtube.com/embed/SYuPvd7m06A" title="OpenAudio S1 Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+</div>
 
 ## **Documentation**
 

+ 63 - 37
docs/ja/index.md

@@ -28,69 +28,81 @@
 
 ---
 
-!!! warning "法的通知"
-    このコードベースの違法な使用について、当方は一切の責任を負いません。お住まいの地域のDMCA(デジタルミレニアム著作権法)およびその他の関連法規をご参照ください。
-    
-    **ライセンス:** このコードベースはApache 2.0ライセンスの下でリリースされ、すべてのモデルはCC-BY-NC-SA-4.0ライセンスの下でリリースされています。
+!!! note "ライセンス通知"
+    このコードベースは **Apacheライセンス** の下でリリースされ、すべてのモデル重みは **CC-BY-NC-SA-4.0ライセンス** の下でリリースされています。詳細は [LICENSE](LICENSE) を参照してください。
+
+!!! warning "法的免責事項"
+    コードベースの違法な使用について、当方は一切の責任を負いません。お住まいの地域のDMCAおよびその他の関連法規をご参照ください。
 
 ## **紹介**
 
 私たちは **OpenAudio** への改名を発表できることを嬉しく思います。Fish-Speechを基盤とし、大幅な改善と新機能を加えた、新しい先進的なText-to-Speechモデルシリーズを紹介します。
 
-**Openaudio-S1-mini**: [動画](アップロード予定); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
+**Openaudio-S1-mini**: [ブログ](https://openaudio.com/blogs/s1); [動画](https://www.youtube.com/watch?v=SYuPvd7m06A); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
 
 **Fish-Speech v1.5**: [動画](https://www.bilibili.com/video/BV1EKiDYBE4o/); [Hugging Face](https://huggingface.co/fishaudio/fish-speech-1.5);
 
-## **ハイライト** ✨
+## **ハイライト**
+
+### **優秀なTTS品質**
+
+Seed TTS評価指標を使用してモデルのパフォーマンスを評価した結果、OpenAudio S1は英語テキストで**0.008 WER**と**0.004 CER**を達成し、以前のモデルより大幅に改善されました。(英語、自動評価、OpenAI gpt-4o-転写に基づく、話者距離はRevai/pyannote-wespeaker-voxceleb-resnet34-LM使用)
+
+| モデル | 単語誤り率 (WER) | 文字誤り率 (CER) | 話者距離 |
+|:-----:|:--------------------:|:-------------------------:|:----------------:|
+| **S1** | **0.008** | **0.004** | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
 
-### **感情制御**
+### **TTS-Arena2最高モデル**
+
+OpenAudio S1は[TTS-Arena2](https://arena.speechcolab.org/)で**#1ランキング**を達成しました。これはtext-to-speech評価のベンチマークです:
+
+<div align="center">
+    <img src="../assets/Elo.jpg" alt="TTS-Arena2 Ranking" style="width: 75%;" />
+</div>
+
+### **音声制御**
 OpenAudio S1は**多様な感情、トーン、特殊マーカーをサポート**して音声合成を強化します:
 
 - **基本感情**:
 ```
-(angry) (sad) (excited) (surprised) (satisfied) (delighted)
-(scared) (worried) (upset) (nervous) (frustrated) (depressed)
-(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
-(grateful) (confident) (interested) (curious) (confused) (joyful)
+(怒った) (悲しい) (興奮した) (驚いた) (満足した) (喜んだ) 
+(怖がった) (心配した) (動揺した) (緊張した) (欲求不満な) (落ち込んだ)
+(共感した) (恥ずかしい) (嫌悪した) (感動した) (誇らしい) (リラックスした)
+(感謝した) (自信のある) (興味のある) (好奇心のある) (困惑した) (楽しい)
 ```
 
 - **高度な感情**:
 ```
-(disdainful) (unhappy) (anxious) (hysterical) (indifferent) 
-(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
-(keen) (disapproving) (negative) (denying) (astonished) (serious)
-(sarcastic) (conciliative) (comforting) (sincere) (sneering)
-(hesitating) (yielding) (painful) (awkward) (amused)
+(軽蔑的な) (不幸な) (不安な) (ヒステリックな) (無関心な) 
+(いらいらした) (罪悪感のある) (軽蔑的な) (パニックした) (激怒した) (不本意な)
+(熱心な) (不賛成の) (否定的な) (否定する) (驚いた) (真剣な)
+(皮肉な) (和解的な) (慰める) (誠実な) (冷笑的な)
+(躊躇する) (譲歩する) (痛々しい) (気まずい) (面白がった)
 ```
 
+(現在英語、中国語、日本語をサポート、より多くの言語が近日公開予定!)
+
 - **トーンマーカー**:
 ```
-(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
+(急いだ調子で) (叫んで) (悲鳴をあげて) (ささやいて) (柔らかい調子で)
 ```
 
 - **特殊音響効果**:
 ```
-(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
-(groaning) (crowd laughing) (background laughter) (audience laughing)
+(笑って) (くすくす笑って) (すすり泣いて) (大声で泣いて) (ため息をついて) (息を切らして)
+(うめいて) (群衆の笑い声) (背景の笑い声) (観客の笑い声)
 ```
 
 Ha,ha,haを使用してコントロールすることもでき、他にも多くの使用法があなた自身の探索を待っています。
 
-### **優秀なTTS品質**
-
-Seed TTS評価指標を使用してモデルのパフォーマンスを評価した結果、OpenAudio S1は英語テキストで**0.008 WER**と**0.004 CER**を達成し、以前のモデルより大幅に改善されました。(英語、自動評価、OpenAI gpt-4o-転写に基づく、話者距離はRevai/pyannote-wespeaker-voxceleb-resnet34-LM使用)
+### **2つのモデルタイプ**
 
-| モデル | 単語誤り率 (WER) | 文字誤り率 (CER) | 話者距離 |
-|-------|----------------------|---------------------------|------------------|
-| **S1** | **0.008**  | **0.004**  | **0.332** |
-| **S1-mini** | **0.011** | **0.005** | **0.380** |
+異なるニーズに対応する2つのモデルバリエーションを提供しています:
 
-### **2つのモデルタイプ**
+- **OpenAudio S1 (40億パラメータ)**:[fish.audio](https://fish.audio) で利用可能な全機能搭載のフラッグシップモデルで、すべての高度な機能を備えた最高品質の音声合成を提供します。
 
-| モデル | サイズ | 利用可能性 | 特徴 |
-|-------|------|--------------|----------|
-| **S1** | 40億パラメータ | [fish.audio](fish.audio) で利用可能 | 全機能搭載のフラッグシップモデル |
-| **S1-mini** | 5億パラメータ | huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) で利用可能 | コア機能を備えた蒸留版 |
+- **OpenAudio S1-mini (5億パラメータ)**:コア機能を備えた蒸留版で、[Hugging Face Space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) で利用可能です。優秀な品質を維持しながら、より高速な推論のために最適化されています。
 
 S1とS1-miniの両方にオンライン人間フィードバック強化学習(RLHF)が組み込まれています。
 
@@ -110,16 +122,30 @@ S1とS1-miniの両方にオンライン人間フィードバック強化学習
 
 7. **GUI推論:** APIサーバーとシームレスに連携するPyQt6グラフィカルインターフェースを提供します。Linux、Windows、macOSをサポートします。[GUIを見る](https://github.com/AnyaCoder/fish-speech-gui)。
 
-8. **デプロイフレンドリー:** Linux、Windows、MacOSの native サポートで推論サーバーを簡単にセットアップし、速度低下を最小化します。
+8. **デプロイフレンドリー:** Linux、Windows(MacOS近日公開)のネイティブサポートで推論サーバーを簡単にセットアップし、速度低下を最小化します。
 
-## **免責事項**
+## **メディア・デモ**
 
-コードベースの違法な使用について、当方は一切の責任を負いません。お住まいの地域のDMCAやその他の関連法律をご参照ください。
+<!-- <div align="center"> -->
 
-## **メディア・デモ**
+<h3><strong>ソーシャルメディア</strong></h3>
+<a href="https://x.com/FishAudio/status/1929915992299450398" target="_blank">
+    <img src="https://img.shields.io/badge/𝕏-最新デモ-black?style=for-the-badge&logo=x&logoColor=white" alt="Latest Demo on X" />
+</a>
+
+<h3><strong>インタラクティブデモ</strong></h3>
 
-#### 🚧 近日公開
-動画デモとチュートリアルは現在開発中です。
+<a href="https://fish.audio" target="_blank">
+    <img src="https://img.shields.io/badge/Fish_Audio-OpenAudio_S1を試す-blue?style=for-the-badge" alt="Try OpenAudio S1" />
+</a>
+<a href="https://huggingface.co/spaces/fishaudio/openaudio-s1-mini" target="_blank">
+    <img src="https://img.shields.io/badge/Hugging_Face-S1_Miniを試す-yellow?style=for-the-badge" alt="Try S1 Mini" />
+</a>
+
+<h3><strong>動画ショーケース</strong></h3>
+<div align="center">
+<iframe width="560" height="315" src="https://www.youtube.com/embed/SYuPvd7m06A" title="OpenAudio S1 Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+</div>
 
 ## **ドキュメント**
 

+ 73 - 36
docs/ko/index.md

@@ -4,7 +4,15 @@
 
 <div align="center">
 
-<img src="../assets/openaudio.jpg" alt="OpenAudio" style="display: block; margin: 0 auto; width: 35%;"/>
+<img src="../assets/opena### **두 가지 모델 유형**
+
+다양한 요구 사항에 맞는 두 가지 모델 변형을 제공합니다:
+
+- **OpenAudio S1 (40억 매개변수)**:[fish.audio](https://fish.audio)에서 이용 가능한 모든 기능을 갖춘 플래그십 모델로, 모든 고급 기능을 갖춘 최고 품질의 음성 합성을 제공합니다.
+
+- **OpenAudio S1-mini (5억 매개변수)**:핵심 기능을 갖춘 경량화 버전으로, [Hugging Face Space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini)에서 이용 가능하며, 우수한 품질을 유지하면서 더 빠른 추론을 위해 최적화되었습니다.
+
+S1과 S1-mini 모두 온라인 인간 피드백 강화 학습(RLHF)이 통합되어 있습니다。t="OpenAudio" style="display: block; margin: 0 auto; width: 35%;"/>
 
 </div>
 
@@ -28,70 +36,85 @@
 
 ---
 
-!!! warning "법적 고지"
-    코드베이스의 불법적인 사용에 대해서는 일체 책임을 지지 않습니다. 귀하의 지역의 DMCA(디지털 밀레니엄 저작권법) 및 기타 관련 법률을 참고하시기 바랍니다.
-    
-    **라이선스:** 이 코드베이스는 Apache 2.0 라이선스 하에 배포되며, 모든 모델은 CC-BY-NC-SA-4.0 라이선스 하에 배포됩니다.
+!!! note "라이선스 안내"
+    이 코드베이스는 **Apache 라이선스** 하에 배포되며, 모든 모델 가중치는 **CC-BY-NC-SA-4.0 라이선스** 하에 배포됩니다. 자세한 내용은 [LICENSE](LICENSE)를 참조하세요.
+
+!!! warning "법적 면책조항"
+    코드베이스의 불법적인 사용에 대해서는 일체 책임을 지지 않습니다. 귀하의 지역의 DMCA 및 기타 관련 법률을 참고하시기 바랍니다.
 
 ## **소개**
 
 저희는 **OpenAudio**로의 브랜드 변경을 발표하게 되어 기쁩니다. Fish-Speech를 기반으로 하여 상당한 개선과 새로운 기능을 추가한 새로운 고급 텍스트-음성 변환 모델 시리즈를 소개합니다.
 
-**Openaudio-S1-mini**: [동영상](업로드 예정); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
+**Openaudio-S1-mini**: [블로그](https://openaudio.com/blogs/s1); [동영상](https://www.youtube.com/watch?v=SYuPvd7m06A); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
 
 **Fish-Speech v1.5**: [동영상](https://www.bilibili.com/video/BV1EKiDYBE4o/); [Hugging Face](https://huggingface.co/fishaudio/fish-speech-1.5);
 
-## **주요 특징**
+## **주요 특징**
 
-### **감정 제어**
+### **뛰어난 TTS 품질**
+
+Seed TTS 평가 지표를 사용하여 모델 성능을 평가한 결과, OpenAudio S1은 영어 텍스트에서 **0.008 WER**과 **0.004 CER**을 달성하여 이전 모델보다 현저히 향상되었습니다. (영어, 자동 평가, OpenAI gpt-4o-전사 기반, 화자 거리는 Revai/pyannote-wespeaker-voxceleb-resnet34-LM 사용)
+
+| 모델 | 단어 오류율 (WER) | 문자 오류율 (CER) | 화자 거리 |
+|:-----:|:--------------------:|:-------------------------:|:----------------:|
+| **S1** | **0.008** | **0.004** | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
+
+### **TTS-Arena2 최고 모델**
+
+OpenAudio S1은 [TTS-Arena2](https://arena.speechcolab.org/)에서 **#1 순위**를 달성했습니다. 이는 텍스트 음성 변환 평가의 기준입니다:
+
+<div align="center">
+    <img src="assets/Elo.jpg" alt="TTS-Arena2 Ranking" style="width: 75%;" />
+</div>
+
+### **음성 제어**
 OpenAudio S1은 **다양한 감정, 톤, 특수 마커를 지원**하여 음성 합성을 향상시킵니다:
 
 - **기본 감정**:
 ```
-(angry) (sad) (excited) (surprised) (satisfied) (delighted)
-(scared) (worried) (upset) (nervous) (frustrated) (depressed)
-(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
-(grateful) (confident) (interested) (curious) (confused) (joyful)
+(화난) (슬픈) (흥미진진한) (놀란) (만족한) (기쁜) 
+(무서워하는) (걱정하는) (속상한) (긴장한) (좌절한) (우울한)
+(공감하는) (당황한) (역겨워하는) (감동한) (자랑스러운) (편안한)
+(감사한) (자신감있는) (관심있는) (호기심있는) (혼란스러운) (즐거운)
 ```
 
 - **고급 감정**:
 ```
-(disdainful) (unhappy) (anxious) (hysterical) (indifferent) 
-(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
-(keen) (disapproving) (negative) (denying) (astonished) (serious)
-(sarcastic) (conciliative) (comforting) (sincere) (sneering)
-(hesitating) (yielding) (painful) (awkward) (amused)
+(경멸하는) (불행한) (불안한) (히스테리컬한) (무관심한) 
+(참을성없는) (죄책감있는) (멸시하는) (공황상태의) (격분한) (마지못한)
+(열망하는) (불찬성하는) (부정적인) (부인하는) (놀란) (진지한)
+(비꼬는) (화해하는) (위로하는) (진실한) (비웃는)
+(주저하는) (굴복하는) (고통스러운) (어색한) (재미있어하는)
 ```
 
+(현재 영어, 중국어, 일본어를 지원하며, 더 많은 언어가 곧 출시될 예정입니다!)
+
 - **톤 마커**:
 ```
-(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
+(서두르는 톤으로) (소리치며) (비명지르며) (속삭이며) (부드러운 톤으로)
 ```
 
 - **특수 음향 효과**:
 ```
-(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
-(groaning) (crowd laughing) (background laughter) (audience laughing)
+(웃으며) (킥킥거리며) (흐느끼며) (크게 울며) (한숨쉬며) (헐떡이며)
+(신음하며) (군중 웃음소리) (배경 웃음소리) (관객 웃음소리)
 ```
 
 Ha,ha,ha를 사용하여 제어할 수도 있으며, 여러분 스스로 탐구할 수 있는 다른 많은 사용법이 있습니다.
 
-### **뛰어난 TTS 품질**
-
-Seed TTS 평가 지표를 사용하여 모델 성능을 평가한 결과, OpenAudio S1은 영어 텍스트에서 **0.008 WER**과 **0.004 CER**을 달성하여 이전 모델보다 현저히 향상되었습니다. (영어, 자동 평가, OpenAI gpt-4o-전사 기반, 화자 거리는 Revai/pyannote-wespeaker-voxceleb-resnet34-LM 사용)
-
-| 모델 | 단어 오류율 (WER) | 문자 오류율 (CER) | 화자 거리 |
-|-------|----------------------|---------------------------|------------------|
-| **S1** | **0.008**  | **0.004**  | **0.332** |
-| **S1-mini** | **0.011** | **0.005** | **0.380** |
-
 ### **두 가지 모델 유형**
 
+<div align="center">
+
 | 모델 | 크기 | 가용성 | 특징 |
 |-------|------|--------------|----------|
-| **S1** | 40억 매개변수 | [fish.audio](fish.audio)에서 이용 가능 | 모든 기능을 갖춘 플래그십 모델 |
+| **S1** | 40억 매개변수 | [fish.audio](https://fish.audio)에서 이용 가능 | 모든 기능을 갖춘 플래그십 모델 |
 | **S1-mini** | 5억 매개변수 | huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini)에서 이용 가능 | 핵심 기능을 갖춘 경량화 버전 |
 
+</div>
+
 S1과 S1-mini 모두 온라인 인간 피드백 강화 학습(RLHF)이 통합되어 있습니다.
 
 ## **기능**
@@ -110,16 +133,30 @@ S1과 S1-mini 모두 온라인 인간 피드백 강화 학습(RLHF)이 통합되
 
 7. **GUI 추론:** API 서버와 원활하게 작동하는 PyQt6 그래픽 인터페이스를 제공합니다. Linux, Windows, macOS를 지원합니다. [GUI 보기](https://github.com/AnyaCoder/fish-speech-gui).
 
-8. **배포 친화적:** Linux, Windows, MacOS의 네이티브 지원으로 추론 서버를 쉽게 설정하여 속도 손실을 최소화합니다.
+8. **배포 친화적:** Linux, Windows (MacOS 곧 출시 예정)의 네이티브 지원으로 추론 서버를 쉽게 설정하여 속도 손실을 최소화합니다.
 
-## **면책 조항**
+## **미디어 및 데모**
 
-코드베이스의 불법적인 사용에 대해서는 일체 책임을 지지 않습니다. 귀하 지역의 DMCA 및 기타 관련 법률을 참고하시기 바랍니다.
+<!-- <div align="center"> -->
 
-## **미디어 및 데모**
+<h3><strong>소셜 미디어</strong></h3>
+<a href="https://x.com/FishAudio/status/1929915992299450398" target="_blank">
+    <img src="https://img.shields.io/badge/𝕏-최신_데모-black?style=for-the-badge&logo=x&logoColor=white" alt="Latest Demo on X" />
+</a>
 
-#### 🚧 곧 출시 예정
-동영상 데모와 튜토리얼이 현재 개발 중입니다.
+<h3><strong>인터랙티브 데모</strong></h3>
+
+<a href="https://fish.audio" target="_blank">
+    <img src="https://img.shields.io/badge/Fish_Audio-OpenAudio_S1_체험-blue?style=for-the-badge" alt="Try OpenAudio S1" />
+</a>
+<a href="https://huggingface.co/spaces/fishaudio/openaudio-s1-mini" target="_blank">
+    <img src="https://img.shields.io/badge/Hugging_Face-S1_Mini_체험-yellow?style=for-the-badge" alt="Try S1 Mini" />
+</a>
+
+<h3><strong>동영상 쇼케이스</strong></h3>
+<div align="center">
+<iframe width="560" height="315" src="https://www.youtube.com/embed/SYuPvd7m06A" title="OpenAudio S1 Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+</div>
 
 ## **문서**
 

+ 63 - 37
docs/pt/index.md

@@ -28,69 +28,81 @@
 
 ---
 
+!!! note "Aviso de Licença"
+    Esta base de código é lançada sob **Licença Apache** e todos os pesos do modelo são lançados sob **Licença CC-BY-NC-SA-4.0**. Consulte [LICENSE](LICENSE) para mais detalhes.
+
 !!! warning "Aviso Legal"
-    Não assumimos nenhuma responsabilidade pelo uso ilegal da base de código. Consulte as leis locais sobre DMCA (Digital Millennium Copyright Act) e outras leis relevantes em sua área.
-    
-    **Licença:** Esta base de código é lançada sob a licença Apache 2.0 e todos os modelos são lançados sob a licença CC-BY-NC-SA-4.0.
+    Não assumimos nenhuma responsabilidade pelo uso ilegal da base de código. Consulte as leis locais sobre DMCA e outras leis relevantes.
 
 ## **Introdução**
 
 Estamos empolgados em anunciar que mudamos nossa marca para **OpenAudio** - introduzindo uma nova série de modelos avançados de Text-to-Speech que se baseia na fundação do Fish-Speech com melhorias significativas e novas capacidades.
 
-**Openaudio-S1-mini**: [Vídeo](A ser carregado); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
+**OpenAudio-S1-mini**: [Blog](https://openaudio.com/blogs/s1); [Vídeo](https://www.youtube.com/watch?v=SYuPvd7m06A); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
 
 **Fish-Speech v1.5**: [Vídeo](https://www.bilibili.com/video/BV1EKiDYBE4o/); [Hugging Face](https://huggingface.co/fishaudio/fish-speech-1.5);
 
-## **Destaques** ✨
+## **Destaques**
+
+### **Qualidade TTS Excelente**
+
+Utilizamos as métricas Seed TTS Eval para avaliar o desempenho do modelo, e os resultados mostram que o OpenAudio S1 alcança **0.008 WER** e **0.004 CER** em texto inglês, que é significativamente melhor que modelos anteriores. (Inglês, avaliação automática, baseada na transcrição OpenAI gpt-4o, distância do falante usando Revai/pyannote-wespeaker-voxceleb-resnet34-LM)
+
+| Modelo | Taxa de Erro de Palavras (WER) | Taxa de Erro de Caracteres (CER) | Distância do Falante |
+|:-----:|:--------------------:|:-------------------------:|:----------------:|
+| **S1** | **0.008** | **0.004** | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
 
-### **Controle Emocional**
-O OpenAudio S1 **suporta uma variedade de marcadores emocionais, de tom e especiais** para aprimorar a síntese de fala:
+### **Melhor Modelo no TTS-Arena2**
+
+OpenAudio S1 alcançou a **classificação #1** no [TTS-Arena2](https://arena.speechcolab.org/), o benchmark para avaliação de text-to-speech:
+
+<div align="center">
+    <img src="assets/Elo.jpg" alt="TTS-Arena2 Ranking" style="width: 75%;" />
+</div>
+
+### **Controle de Fala**
+OpenAudio S1 **suporta uma variedade de marcadores emocionais, de tom e especiais** para aprimorar a síntese de fala:
 
 - **Emoções básicas**:
 ```
-(angry) (sad) (excited) (surprised) (satisfied) (delighted)
-(scared) (worried) (upset) (nervous) (frustrated) (depressed)
-(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
-(grateful) (confident) (interested) (curious) (confused) (joyful)
+(raivoso) (triste) (animado) (surpreso) (satisfeito) (encantado) 
+(com medo) (preocupado) (chateado) (nervoso) (frustrado) (deprimido)
+(empático) (envergonhado) (nojento) (comovido) (orgulhoso) (relaxado)
+(grato) (confiante) (interessado) (curioso) (confuso) (alegre)
 ```
 
 - **Emoções avançadas**:
 ```
-(disdainful) (unhappy) (anxious) (hysterical) (indifferent) 
-(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
-(keen) (disapproving) (negative) (denying) (astonished) (serious)
-(sarcastic) (conciliative) (comforting) (sincere) (sneering)
-(hesitating) (yielding) (painful) (awkward) (amused)
+(desdenhoso) (infeliz) (ansioso) (histérico) (indiferente) 
+(impaciente) (culpado) (desprezível) (em pânico) (furioso) (relutante)
+(entusiasmado) (desaprovador) (negativo) (negando) (espantado) (sério)
+(sarcástico) (conciliador) (consolador) (sincero) (zombeteiro)
+(hesitante) (cedendo) (doloroso) (constrangido) (divertido)
 ```
 
+(Suporte para inglês, chinês e japonês agora, e mais idiomas em breve!)
+
 - **Marcadores de tom**:
 ```
-(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
+(em tom de pressa) (gritando) (berrando) (sussurrando) (tom suave)
 ```
 
 - **Efeitos sonoros especiais**:
 ```
-(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
-(groaning) (crowd laughing) (background laughter) (audience laughing)
+(rindo) (gargalhando) (soluçando) (chorando alto) (suspirando) (ofegante)
+(gemendo) (risada da multidão) (risada de fundo) (risada da plateia)
 ```
 
 Você também pode usar Ha,ha,ha para controlar, há muitos outros casos esperando para serem explorados por você mesmo.
 
-### **Qualidade TTS Excelente**
-
-Utilizamos as métricas Seed TTS Eval para avaliar o desempenho do modelo, e os resultados mostram que o OpenAudio S1 alcança **0.008 WER** e **0.004 CER** em texto inglês, que é significativamente melhor que modelos anteriores. (Inglês, avaliação automática, baseada na transcrição OpenAI gpt-4o, distância do falante usando Revai/pyannote-wespeaker-voxceleb-resnet34-LM)
+### **Dois Tipos de Modelos**
 
-| Modelo | Taxa de Erro de Palavras (WER) | Taxa de Erro de Caracteres (CER) | Distância do Falante |
-|-------|----------------------|---------------------------|------------------|
-| **S1** | **0.008**  | **0.004**  | **0.332** |
-| **S1-mini** | **0.011** | **0.005** | **0.380** |
+Oferecemos duas variantes de modelo para atender diferentes necessidades:
 
-### **Dois Tipos de Modelos**
+- **OpenAudio S1 (4B parâmetros)**: Nosso modelo principal com todas as funcionalidades disponível em [fish.audio](https://fish.audio), oferecendo a mais alta qualidade de síntese de fala com todas as características avançadas.
 
-| Modelo | Tamanho | Disponibilidade | Características |
-|-------|------|--------------|----------|
-| **S1** | 4B parâmetros | Disponível em [fish.audio](fish.audio) | Modelo principal com todas as funcionalidades |
-| **S1-mini** | 0.5B parâmetros | Disponível no huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) | Versão destilada com capacidades principais |
+- **OpenAudio S1-mini (0.5B parâmetros)**: Uma versão destilada com capacidades principais, disponível no [Hugging Face Space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini), otimizada para inferência mais rápida mantendo excelente qualidade.
 
 Tanto o S1 quanto o S1-mini incorporam Aprendizado por Reforço Online com Feedback Humano (RLHF).
 
@@ -110,16 +122,30 @@ Tanto o S1 quanto o S1-mini incorporam Aprendizado por Reforço Online com Feedb
 
 7. **Inferência GUI:** Oferece uma interface gráfica PyQt6 que funciona perfeitamente com o servidor API. Suporta Linux, Windows e macOS. [Ver GUI](https://github.com/AnyaCoder/fish-speech-gui).
 
-8. **Amigável para Deploy:** Configure facilmente um servidor de inferência com suporte nativo para Linux, Windows e MacOS, minimizando a perda de velocidade.
+8. **Amigável para Deploy:** Configure facilmente um servidor de inferência com suporte nativo para Linux, Windows (MacOS em breve), minimizando a perda de velocidade.
 
-## **Isenção de Responsabilidade**
+## **Mídia e Demos**
 
-Não assumimos nenhuma responsabilidade pelo uso ilegal da base de código. Consulte suas leis locais sobre DMCA e outras leis relacionadas.
+<!-- <div align="center"> -->
 
-## **Mídia e Demos**
+<h3><strong>Mídia Social</strong></h3>
+<a href="https://x.com/FishAudio/status/1929915992299450398" target="_blank">
+    <img src="https://img.shields.io/badge/𝕏-Demo_Mais_Recente-black?style=for-the-badge&logo=x&logoColor=white" alt="Latest Demo on X" />
+</a>
+
+<h3><strong>Demos Interativos</strong></h3>
 
-#### 🚧 Em Breve
-Demonstrações em vídeo e tutoriais estão atualmente em desenvolvimento.
+<a href="https://fish.audio" target="_blank">
+    <img src="https://img.shields.io/badge/Fish_Audio-Experimente_OpenAudio_S1-blue?style=for-the-badge" alt="Try OpenAudio S1" />
+</a>
+<a href="https://huggingface.co/spaces/fishaudio/openaudio-s1-mini" target="_blank">
+    <img src="https://img.shields.io/badge/Hugging_Face-Experimente_S1_Mini-yellow?style=for-the-badge" alt="Try S1 Mini" />
+</a>
+
+<h3><strong>Showcases em Vídeo</strong></h3>
+<div align="center">
+<iframe width="560" height="315" src="https://www.youtube.com/embed/SYuPvd7m06A" title="OpenAudio S1 Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+</div>
 
 ## **Documentação**
 

+ 64 - 38
docs/zh/index.md

@@ -20,7 +20,7 @@
 <a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
 <img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
 </a>
-</div>
+</div>·
 
 <strong>立即试用:</strong> <a href="https://fish.audio">Fish Audio Playground</a> | <strong>了解更多:</strong> <a href="https://openaudio.com">OpenAudio 网站</a>
 
@@ -28,69 +28,81 @@
 
 ---
 
-!!! warning "法律声明"
-    我们不对代码库的任何非法使用承担责任。请参考您所在地区有关 DMCA(数字千年版权法)和其他相关法律的规定。
-    
-    **许可证:** 此代码库在 Apache 2.0 许可证下发布,所有模型在 CC-BY-NC-SA-4.0 许可证下发布。
+!!! note "许可证声明"
+    此代码库在 **Apache 许可证** 下发布,所有模型权重在 **CC-BY-NC-SA-4.0 许可证** 下发布。更多详情请参阅 [LICENSE](LICENSE)。
+
+!!! warning "法律免责声明"
+    我们不对代码库的任何非法使用承担责任。请参考您所在地区有关 DMCA 和其他相关法律的规定。
 
 ## **介绍**
 
 我们很高兴地宣布,我们已经更名为 **OpenAudio** - 推出全新的先进文字转语音模型系列,在 Fish-Speech 的基础上进行了重大改进并增加了新功能。
 
-**Openaudio-S1-mini**: [视频](即将上传); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
+**Openaudio-S1-mini**: [博客](https://openaudio.com/blogs/s1); [视频](https://www.youtube.com/watch?v=SYuPvd7m06A); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
 
 **Fish-Speech v1.5**: [视频](https://www.bilibili.com/video/BV1EKiDYBE4o/); [Hugging Face](https://huggingface.co/fishaudio/fish-speech-1.5);
 
-## **亮点** ✨
+## **亮点**
+
+### **优秀的 TTS 质量**
+
+我们使用 Seed TTS 评估指标来评估模型性能,结果显示 OpenAudio S1 在英文文本上达到了 **0.008 WER** 和 **0.004 CER**,明显优于以前的模型。(英语,自动评估,基于 OpenAI gpt-4o-转录,说话人距离使用 Revai/pyannote-wespeaker-voxceleb-resnet34-LM)
+
+| 模型 | 词错误率 (WER) | 字符错误率 (CER) | 说话人距离 |
+|:-----:|:--------------------:|:-------------------------:|:----------------:|
+| **S1** | **0.008** | **0.004** | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
 
-### **情感控制**
+### **TTS-Arena2 最佳模型**
+
+OpenAudio S1 在 [TTS-Arena2](https://arena.speechcolab.org/) 上获得了 **#1 排名**,这是文字转语音评估的基准:
+
+<div align="center">
+    <img src="../assets/Elo.jpg" alt="TTS-Arena2 Ranking" style="width: 75%;" />
+</div>
+
+### **语音控制**
 OpenAudio S1 **支持多种情感、语调和特殊标记**来增强语音合成效果:
 
 - **基础情感**:
 ```
-(angry) (sad) (excited) (surprised) (satisfied) (delighted)
-(scared) (worried) (upset) (nervous) (frustrated) (depressed)
-(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
-(grateful) (confident) (interested) (curious) (confused) (joyful)
+(生气) (伤心) (兴奋) (惊讶) (满意) (高兴) 
+(害怕) (担心) (沮丧) (紧张) (失望) (沮丧)
+(共情) (尴尬) (厌恶) (感动) (自豪) (放松)
+(感激) (自信) (感兴趣) (好奇) (困惑) (快乐)
 ```
 
 - **高级情感**:
 ```
-(disdainful) (unhappy) (anxious) (hysterical) (indifferent) 
-(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
-(keen) (disapproving) (negative) (denying) (astonished) (serious)
-(sarcastic) (conciliative) (comforting) (sincere) (sneering)
-(hesitating) (yielding) (painful) (awkward) (amused)
+(鄙视) (不高兴) (焦虑) (歇斯底里) (漠不关心) 
+(不耐烦) (内疚) (轻蔑) (恐慌) (愤怒) (不情愿)
+(渴望) (不赞成) (否定) (否认) (惊讶) (严肃)
+(讽刺) (和解) (安慰) (真诚) (冷笑)
+(犹豫) (让步) (痛苦) (尴尬) (开心)
 ```
 
+(现在支持英语、中文和日语,更多语言即将推出!)
+
 - **语调标记**:
 ```
-(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
+(匆忙的语调) (大喊) (尖叫) (耳语) (轻声)
 ```
 
 - **特殊音效**:
 ```
-(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
-(groaning) (crowd laughing) (background laughter) (audience laughing)
+(笑) (轻笑) (抽泣) (大哭) (叹气) (喘气)
+(呻吟) (群体笑声) (背景笑声) (观众笑声)
 ```
 
 您还可以使用 Ha,ha,ha 来控制,还有许多其他用法等待您自己探索。
 
-### **卓越的 TTS 质量**
-
-我们使用 Seed TTS 评估指标来评估模型性能,结果显示 OpenAudio S1 在英文文本上达到了 **0.008 WER** 和 **0.004 CER**,明显优于以前的模型。(英语,自动评估,基于 OpenAI gpt-4o-转录,说话人距离使用 Revai/pyannote-wespeaker-voxceleb-resnet34-LM)
+### **两种模型类型**
 
-| 模型 | 词错误率 (WER) | 字符错误率 (CER) | 说话人距离 |
-|-------|----------------------|---------------------------|------------------|
-| **S1** | **0.008**  | **0.004**  | **0.332** |
-| **S1-mini** | **0.011** | **0.005** | **0.380** |
+我们提供两种模型变体以满足不同需求:
 
-### **两种模型类型**
+- **OpenAudio S1 (40亿参数)**:我们功能齐全的旗舰模型,可在 [fish.audio](https://fish.audio) 上使用,提供最高质量的语音合成和所有高级功能。
 
-| 模型 | 规模 | 可用性 | 特性 |
-|-------|------|--------------|----------|
-| **S1** | 40亿参数 | 在 [fish.audio](fish.audio) 上可用 | 功能齐全的旗舰模型 |
-| **S1-mini** | 5亿参数 | 在 huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) 上可用 | 具有核心功能的蒸馏版本 |
+- **OpenAudio S1-mini (5亿参数)**:具有核心功能的蒸馏版本,可在 [Hugging Face Space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) 上使用,针对更快推理进行优化,同时保持出色的质量。
 
 S1 和 S1-mini 都集成了在线人类反馈强化学习 (RLHF)。
 
@@ -110,16 +122,30 @@ S1 和 S1-mini 都集成了在线人类反馈强化学习 (RLHF)。
 
 7. **GUI 推理:** 提供与 API 服务器无缝配合的 PyQt6 图形界面。支持 Linux、Windows 和 macOS。[查看 GUI](https://github.com/AnyaCoder/fish-speech-gui)。
 
-8. **部署友好:** 轻松设置推理服务器,原生支持 Linux、Windows 和 MacOS,最小化速度损失。
+8. **部署友好:** 轻松设置推理服务器,原生支持 Linux、Windows(MacOS 即将推出),最小化速度损失。
 
-## **免责声明**
+## **媒体和演示**
 
-我们不对代码库的任何非法使用承担责任。请参考您当地关于 DMCA 和其他相关法律的规定。
+<!-- <div align="center"> -->
 
-## **媒体和演示**
+<h3><strong>社交媒体</strong></h3>
+<a href="https://x.com/FishAudio/status/1929915992299450398" target="_blank">
+    <img src="https://img.shields.io/badge/𝕏-最新演示-black?style=for-the-badge&logo=x&logoColor=white" alt="Latest Demo on X" />
+</a>
+
+<h3><strong>互动演示</strong></h3>
 
-#### 🚧 即将推出
-视频演示和教程正在开发中。
+<a href="https://fish.audio" target="_blank">
+    <img src="https://img.shields.io/badge/Fish_Audio-试用_OpenAudio_S1-blue?style=for-the-badge" alt="Try OpenAudio S1" />
+</a>
+<a href="https://huggingface.co/spaces/fishaudio/openaudio-s1-mini" target="_blank">
+    <img src="https://img.shields.io/badge/Hugging_Face-试用_S1_Mini-yellow?style=for-the-badge" alt="Try S1 Mini" />
+</a>
+
+<h3><strong>视频展示</strong></h3>
+<div align="center">
+<iframe width="560" height="315" src="https://www.youtube.com/embed/SYuPvd7m06A" title="OpenAudio S1 Video" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+</div>
 
 ## **文档**