Infra

约 160 字小于 1 分钟

2025-10-13

vLLM

vLLM is a fast and easy-to-use library for LLM inference and serving.

用于非 GGUF 格式模型，仅支持 Linux。

LLaMA Box is an LM inference server(pure API, w/o frontend assets) based on the llama.cpp and stable-diffusion.cpp.

用于 GGUF 格式模型，支持 Linux, macOS 和 Windows。

A text-to-speech and speech-to-text server compatible with the OpenAI API, powered by backend support from Whisper, FunASR, Bark, Dia and CosyVoice.

用于非 GGUF 格式的语音模型，仅支持 NVIDIA GPU 与 CPU。

Execution and deployment platform for inference services, enabling efficient application implementation.

用于非 GGUF 格式模型，仅支持昇腾 910B 和 310P。