✅ Highlights
Model Virtual Environments (Virtualenv) Enabled by Default
Starting from v2.0, model virtual environments are enabled by default. Each model can run in an independent Python dependency space.
CUDA Base Image Unified Upgrade to 12.9
From v2.0 onwards, the official CUDA base image minimum version is unified to CUDA 12.9. Historical versions (such as 12.4, etc.) have been removed. A cleaner foundational environment provides a reliable basis for next-generation inference engines and high-performance features.
Full Support for Qwen3-VL Multimodal Embedding & Reranker Models
Added complete support for:
- Qwen3-VL-Embedding (2B / 8B)
- Qwen3-VL-Reranker (2B / 8B)
Covering key capabilities such as multimodal vectorization, retrieval, and reranking.
🌐 Community Edition Updates
pip install 'xinference==2.0.0'
🆕 New Model Support
- • Qwen3-VL-Embedding-2B / 8B
- • Qwen3-VL-Reranker-2B / 8B
- • MinerU2.5-2509-1.2B
- • Z-Image
- • GLM-4.6
✨ New Features
- • Model virtual environments enabled by default, supporting independent dependencies and engines
- • Support for
chat_template.jinja, making Prompt templates more flexible - • Support for configuring virtual environments per engine
- • Added GGUF cache management for multimodal video models
- • Added JSON parsing capability for custom LLM model configurations
🛠 Enhancements
- • Model configuration now supports declaring caching strategies to reduce unnecessary downloads
- • Continuous updates to model JSONs, covering LLM / Image / Video / Embedding / Rerank
🐞 Bug Fixes
- • Fixed compatibility issues with MUSA / Xavier and other platforms
- • Fixed exceptions in downloading and caching multimodal models
- • Fixed an issue where newer versions of vLLM could not start embedding models
- • Fixed a potential deadlock issue during concurrent downloads
📚 Documentation Updates
- • Added Xinference 2.0 usage documentation
- • Added solutions for common Virtualenv CUDA / NCCL / cuDNN issues
- • Updated historical version release notes
🏢 Enterprise Edition Updates
Page-Level Permission Control Released
Supports finer-grained access and operation permission control, meeting the needs of enterprise-level multi-role and multi-team scenarios.
Brand New, Modern UI Design
The Enterprise Edition interface has been fully upgraded with a more modern style and clearer interactions, significantly enhancing the experience of large-scale model management and operations.
Welcome to Upgrade and Experience Xinference 2.0
We welcome your continued feedback and contributions 🚀
Visit Official Website