✅ Key Highlights
Xinference Model Hub Official Launch
Get the latest models through model.xinference.io without waiting for Xinference releases to update models.
Documentation: Model Update Guide
Embedding Auto-Batch Support
Multiple concurrent embedding requests are automatically batched, significantly improving throughput:
- Parallel requests automatically aggregated into efficient batches
- Transparent to applications - no code changes required
- Average response time reduced by up to 10x
🌐Community Edition Updates
Installation
pip install:
pip install 'xinference==1.13.0'
Docker:
Pull latest image or update with pip inside container
🆕New Model Support
- Qwen3-VL-MLX (Multimodal support with MLX framework)
✨New Features
- Auto-batch embedding
- Update models from Xinference Model Hub
- Support for updating model JSON metadata
🛠Enhancements
- IndexTTS2 streaming output support
- IndexTTS2 offline deployment support
- Added embedding benchmark
- Fixed CI build issues caused by PEFT version
🐞Bug Fixes
- Fixed DeepSeek-OCR runtime exceptions in Docker
- Tool call ID uses UUID to avoid duplicates
- Fixed audio/video/image model cache list display issues
📚Documentation Updates
🏢Enterprise Edition Updates
Added MinerU 2.5 Support
More powerful PDF/document parsing capabilities
Added paddleocr-vl Support
OCR + visual understanding integrated model, suitable for more business scenarios
System Stability Enhancement
Continuously fixed multiple issues, improving reliability for large-scale cluster operations