Xinference v1.10.1

Release Notes

Visit xinference.io

Highlights

Extensive Model Support

Qwen3-VL and Qwen3-Next — Latest members of the Qwen3 family
Qwen-Image-Edit-2509 — Continuously enhanced image editing capabilities
IndexTTS2 — New high-quality speech synthesis model

sglang Engine Supports Structured Output

Now supports OpenAI-style json structured output, further enhancing compatibility with external APIs.

Community Edition Updates

Installation

pip install: pip install 'xinference==1.10.1'

Docker usage:

Pull the latest image, or update with pip inside the container

New Model Support

IndexTTS2

Qwen-Image-Edit-2509

Baichuan-M2

Qwen3-VL

Qwen3-Next

New Features

sglang supports json structured output
UI: Support for request_limits parameter
UI: Dynamic detection of download_hub
Support listing flexible models via WebUI and command line

Enhancements

Optimized Qwen2.5-VL performance on Mac MPS
deepseek-r1-0528 model supports tool_calls
Upgraded funasr dependency
Updated CUDA 12.8 Dockerfile
UI model launch page refactored

Bug Fixes

Optimized rerank model lookup logic, support for video model type
Fixed VLLM version dependency issue required by seed-oss
Fixed registration issue when model names are duplicated
Fixed issue where custom model drawer component in UI cannot be opened
Fixed issue where registered models cannot use tools
Fixed finish_reason field processing logic
Fixed vllm structured output compatibility issue

Documentation Updates

Updated README with additional MaxKB related information

Enterprise Edition Updates (v0.2.4 Release)

Enhanced PDF Parsing Capabilities

Added support for more PDF parsing models, further strengthening enterprise document processing capabilities

Feature and Performance Optimization

Overall usability and stability continuously improved