This page covers server-side inference for Fish Audio S2, plus quick links for WebUI inference and Docker deployment.
Fish Speech provides an HTTP API server entrypoint at tools/api_server.py.
python tools/api_server.py \
--llama-checkpoint-path checkpoints/s2-pro \
--decoder-checkpoint-path checkpoints/s2-pro/codec.pth \
--listen 0.0.0.0:8080
Common options:
--compile: enable torch.compile optimization--half: use fp16 mode--api-key: require bearer token authentication--workers: set worker process countcurl -X GET http://127.0.0.1:8080/v1/health
Expected response:
{"status":"ok"}
POST /v1/tts for text-to-speech generationPOST /v1/vqgan/encode for VQ encodePOST /v1/vqgan/decode for VQ decodeThe base TTS model is selected when the server starts. In the example above, the server is started with the checkpoints/s2-pro weights, so every request sent to http://127.0.0.1:8080/v1/tts will use S2-Pro automatically. There is no separate per-request model field in tools/api_client.py for local server calls.
python tools/api_client.py \
--url http://127.0.0.1:8080/v1/tts \
--text "Hello from Fish Speech" \
--output s2-pro-demo
If you want to select a saved reference voice, use --reference_id. This chooses the voice reference, not the base TTS model:
python tools/api_client.py \
--url http://127.0.0.1:8080/v1/tts \
--text "Hello from Fish Speech" \
--reference_id my-speaker \
--output s2-pro-demo
For WebUI usage, see:
For Docker-based server or WebUI deployment, see:
You can also start the server profile directly with Docker Compose:
docker compose --profile server up