✅ Highlights
🧩 VASTAI GPU (VACC) Support
Added support for VASTAI GPUs, extended to VLM (Vision Language Models) scenarios, further expanding the hardware ecosystem.
🍎 Apple MLX Backend - Continuous Batching
MLX chat models now support continuous batching, enabling concurrent request processing and significantly improving throughput and concurrency performance.
🧠 New Model Support
- Qwen-Image-Layered
- Fun-ASR-Nano-2512
- Fun-ASR-MLT-Nano-2512
⚠️ Python Version Support Change
Starting from this version, Python 3.9 is no longer supported. Please use Python 3.10 or above.
🌐 Community Edition Updates
📦 Installation
- Pip install:
pip install 'xinference==1.16.0' - Docker: Pull the latest image or update via pip in the container
🆕 New Model Support
- • Qwen-Image-Layered
- • Fun-ASR-Nano-2512
- • Fun-ASR-MLT-Nano-2512
✨ New Features
- • vLLM Backend: Added vLLM engine support for DeepSeek-V3.2 / DeepSeek-V3.2-Exp
- • VACC (VASTAI GPU): Support for LLM and VLM inference
- • MLX: Chat models support continuous batching for concurrent inference
- • Rerank: Support for async batch processing
- • Model Launch: Added
architecturesfield - • UI: Image models support configuration via environment variables and custom parameters
- • MiniMaxM2ForCausalLM: Added vLLM backend support
🛠 Enhancements
- • Replica allocation optimization, more contiguous GPU index allocation
- • Docker image upgraded to CUDA 12.9, using vLLM v0.11.2
- • Support for torchaudio 2.9.0
- • Continuous updates to model metadata (JSON) (DeepSeek, GLM, LLaMA, Jina, Z-Image, etc.)
🐞 Bug Fixes
- • Fixed PaddleOCR-VL output anomalies
- • Fixed custom embedding / rerank analysis errors
- • Fixed CPU startup and multi-worker startup issues
- • Fixed OCR API returning empty results
- • Fixed
n_gpuparameter handling issues
📚 Documentation Updates
- • Updated new model documentation
- • Added v1.15.0 release documentation
🏢 Enterprise Edition Updates
-
•
Ascend Performance Optimization: Further improved inference performance and stability on Ascend platform
-
•
Enhanced Fine-tuning: Strengthened fine-tuning workflow and capabilities, supporting more complex enterprise-level training and tuning requirements