Xinference v2.0.0 Release Notes

✅ Highlights

🧩

Model Virtual Environments (Virtualenv) Enabled by Default

Starting from v2.0, model virtual environments are enabled by default. Each model can run in an independent Python dependency space.

Supports specific dependencies and inference engine configurations for different models. This effectively avoids model startup failures caused by dependency version conflicts, significantly improving system stability and maintainability.

🚀

CUDA Base Image Unified Upgrade to 12.9

From v2.0 onwards, the official CUDA base image minimum version is unified to CUDA 12.9. Historical versions (such as 12.4, etc.) have been removed. A cleaner foundational environment provides a reliable basis for next-generation inference engines and high-performance features.

🧠

Full Support for Qwen3-VL Multimodal Embedding & Reranker Models

Added complete support for:

Qwen3-VL-Embedding (2B / 8B)
Qwen3-VL-Reranker (2B / 8B)

Covering key capabilities such as multimodal vectorization, retrieval, and reranking.

🌐 Community Edition Updates

📦 Installation

pip pip install 'xinference==2.0.0'

Docker Pull the latest image, or use pip to update within the container

🆕 New Model Support

• Qwen3-VL-Embedding-2B / 8B
• Qwen3-VL-Reranker-2B / 8B
• MinerU2.5-2509-1.2B
• Z-Image
• GLM-4.6

✨ New Features

• Model virtual environments enabled by default, supporting independent dependencies and engines
• Support for chat_template.jinja, making Prompt templates more flexible
• Support for configuring virtual environments per engine
• Added GGUF cache management for multimodal video models
• Added JSON parsing capability for custom LLM model configurations

🛠 Enhancements

• Model configuration now supports declaring caching strategies to reduce unnecessary downloads
• Continuous updates to model JSONs, covering LLM / Image / Video / Embedding / Rerank

🐞 Bug Fixes

• Fixed compatibility issues with MUSA / Xavier and other platforms
• Fixed exceptions in downloading and caching multimodal models
• Fixed an issue where newer versions of vLLM could not start embedding models
• Fixed a potential deadlock issue during concurrent downloads

📚 Documentation Updates

• Added Xinference 2.0 usage documentation

• Added solutions for common Virtualenv CUDA / NCCL / cuDNN issues
• Updated historical version release notes

🏢 Enterprise Edition Updates

🔐

Page-Level Permission Control Released

Supports finer-grained access and operation permission control, meeting the needs of enterprise-level multi-role and multi-team scenarios.

🎨

Brand New, Modern UI Design

The Enterprise Edition interface has been fully upgraded with a more modern style and clearer interactions, significantly enhancing the experience of large-scale model management and operations.

Welcome to Upgrade and Experience Xinference 2.0

We welcome your continued feedback and contributions 🚀

Visit Official Website