🚀 Xinference v2.1.0 Release Notes
✅ Highlights
-
🧠 GLM-4.7 / GLM-4.7-Flash Support
Added full support for GLM-4.7 and GLM-4.7-Flash, further expanding the GLM model ecosystem.
-
🎤 Qwen3-ASR Series Launched
New additions:
- Qwen3-ASR-0.6B
- Qwen3-ASR-1.7B
Fully supports Qwen3-ASR speech recognition models, covering both lightweight and high-performance scenarios.
-
🖼️ FLUX.2-Klein Series Support
New additions:
- FLUX.2-Klein-4B
- FLUX.2-Klein-9B
Enhanced image generation and editing capabilities, continuously improving FLUX ecosystem support.
-
🔁 MinerU2.5-2509-1.2B Adjustment
Updated and adjusted the MinerU2.5-2509-1.2B model, optimizing model configuration and adaptation processes.
🌐 Community Edition Updates
📦 Installation
- pip:
pip install 'xinference==2.1.0'
- Docker: Pull the latest image, or update via pip inside the container.
🆕 New Model Support
- GLM-4.7
- GLM-4.7-Flash
- Qwen3-ASR-0.6B / 1.7B
- FLUX.2-Klein-4B / 9B
🛠 Enhancements
- Updated DeepSeek-V3.2 / DeepSeek-V3.2-Exp model configurations.
- Optimized image build dependencies (constrained
setuptools < 82).
- Refactored API layer structure:
- Extracted Pydantic request Schemas.
- Modularized route registration for clearer code structure.
🐞 Bug Fixes
- Fixed vLLM embedding model error.
- Fixed vLLM reranker scoring anomaly.
- Fixed vLLM reranker GPU release issue.
- Compatible with vLLM async tokenizer logic.
- Fixed CI issues related to setuptools.
📚 Documentation
- Added v2.0.0 release notes.
🏢 Enterprise Edition Updates
-
🔧 Stability Enhancements
Includes multiple underlying optimizations and bug fixes to improve overall runtime stability and enterprise deployment reliability.
Welcome to upgrade and experience Xinference v2.1.0 🚀