Xinference v1.16.0 Release Notes

✅ Highlights

🧩 VASTAI GPU (VACC) Support

Added support for VASTAI GPUs, extended to VLM (Vision Language Models) scenarios, further expanding the hardware ecosystem.

🍎 Apple MLX Backend - Continuous Batching

MLX chat models now support continuous batching, enabling concurrent request processing and significantly improving throughput and concurrency performance.

🧠 New Model Support

Qwen-Image-Layered
Fun-ASR-Nano-2512
Fun-ASR-MLT-Nano-2512

⚠️ Python Version Support Change

Starting from this version, Python 3.9 is no longer supported. Please use Python 3.10 or above.

🌐 Community Edition Updates

📦 Installation

Pip install: pip install 'xinference==1.16.0'
Docker: Pull the latest image or update via pip in the container

🆕 New Model Support

• Qwen-Image-Layered
• Fun-ASR-Nano-2512
• Fun-ASR-MLT-Nano-2512

✨ New Features

• vLLM Backend: Added vLLM engine support for DeepSeek-V3.2 / DeepSeek-V3.2-Exp
• VACC (VASTAI GPU): Support for LLM and VLM inference
• MLX: Chat models support continuous batching for concurrent inference
• Rerank: Support for async batch processing
• Model Launch: Added architectures field
• UI: Image models support configuration via environment variables and custom parameters
• MiniMaxM2ForCausalLM: Added vLLM backend support

🛠 Enhancements

• Replica allocation optimization, more contiguous GPU index allocation
• Docker image upgraded to CUDA 12.9, using vLLM v0.11.2
• Support for torchaudio 2.9.0
• Continuous updates to model metadata (JSON) (DeepSeek, GLM, LLaMA, Jina, Z-Image, etc.)

🐞 Bug Fixes

• Fixed PaddleOCR-VL output anomalies
• Fixed custom embedding / rerank analysis errors
• Fixed CPU startup and multi-worker startup issues
• Fixed OCR API returning empty results
• Fixed n_gpu parameter handling issues

📚 Documentation Updates

• Updated new model documentation
• Added v1.15.0 release documentation

🏢 Enterprise Edition Updates

•
Ascend Performance Optimization: Further improved inference performance and stability on Ascend platform
•
Enhanced Fine-tuning: Strengthened fine-tuning workflow and capabilities, supporting more complex enterprise-level training and tuning requirements