Nessuna descrizione

593 Commit

spicysama f8a57fb61f Update docs (#638)		1 anno fa
.github	ecaa69e7fc Update docs (#626)	1 anno fa
docs	f8a57fb61f Update docs (#638)	1 anno fa
fish_speech	e37a445f51 Fix backend (#627)	1 anno fa
tools	f8a57fb61f Update docs (#638)	1 anno fa
.dockerignore	e413df7145 perf: Optimizing docker builds (#547)	1 anno fa
.gitignore	e9394c71f0 keep up with official close-source api (#513)	1 anno fa
.pre-commit-config.yaml	97625fb8e7 [pre-commit.ci] pre-commit autoupdate (#599)	1 anno fa
.project-root	5707699dfd Handle adaptive number of codebooks	2 anni fa
.readthedocs.yaml	fe293ca492 Use readthedocs instead of github action	2 anni fa
API_FLAGS.txt	dad516d86d update checkpoint path	1 anno fa
LICENSE	b91815e074 Switch to CC-BY-NC-SA 4.0 license	1 anno fa
README.md	6f260179ad Fix dockerfile (#622)	1 anno fa
docker-compose.dev.yml	f6c56c68d4 Update docker-compose.dev.yml	1 anno fa
dockerfile	23fa4d7e38 Fix dockerfile for `pyaudio` (#623)	1 anno fa
dockerfile.dev	23fa4d7e38 Fix dockerfile for `pyaudio` (#623)	1 anno fa
entrypoint.sh	e413df7145 perf: Optimizing docker builds (#547)	1 anno fa
inference.ipynb	dad516d86d update checkpoint path	1 anno fa
install_env.bat	f15d9f23a9 feat: enable more workers in `api.py` (#621)	1 anno fa
mkdocs.yml	4f097ef2f4 remove ghcr & update docker registry	1 anno fa
pyproject.toml	e37a445f51 Fix backend (#627)	1 anno fa
pyrightconfig.json	6d57066e52 Update pre-commit hook	2 anni fa
run_cmd.bat	8702c61100 From whisper to sensevoice (#482)	1 anno fa
start.bat	46440f25be 对脚本的一点小修改 (#414)	1 anno fa

Fish Speech

**English** | [简体中文](docs/README.zh.md) | [Portuguese](docs/README.pt-BR.md) | [日本語](docs/README.ja.md)

This codebase and all models are released under CC-BY-NC-SA-4.0 License. Please refer to LICENSE for more details.

Features

Zero-shot & Few-shot TTS: Input a 10 to 30-second vocal sample to generate high-quality TTS output. For detailed guidelines, see Voice Cloning Best Practices.
Multilingual & Cross-lingual Support: Simply copy and paste multilingual text into the input box—no need to worry about the language. Currently supports English, Japanese, Korean, Chinese, French, German, Arabic, and Spanish.
No Phoneme Dependency: The model has strong generalization capabilities and does not rely on phonemes for TTS. It can handle text in any language script.
Highly Accurate: Achieves a low CER (Character Error Rate) and WER (Word Error Rate) of around 2% for 5-minute English texts.
Fast: With fish-tech acceleration, the real-time factor is approximately 1:5 on an Nvidia RTX 4060 laptop and 1:15 on an Nvidia RTX 4090.
WebUI Inference: Features an easy-to-use, Gradio-based web UI compatible with Chrome, Firefox, Edge, and other browsers.
GUI Inference: Offers a PyQt6 graphical interface that works seamlessly with the API server. Supports Linux, Windows, and macOS. See GUI.
Deploy-Friendly: Easily set up an inference server with native support for Linux, Windows and MacOS, minimizing speed loss.