โ Key Highlights
๐ง
New Model Support
- โข Qwen3-4B Instruct / Thinking
- โข MiniCPM-V 4.5
โ๏ธ
VLLM Engine Enhancements
- โข Multi-model loading support
- โข AWQ 8bit quantization support
- โข VLLM upgraded to 0.10.2 in CUDA 12.8 image
๐ผ๏ธ
OpenAI Image Edit API Support
Direct compatibility with images/edits interface, enhancing compatibility with image editing and generation models.
๐ Community Edition Updates
๐ฆ Installation
pip
pip install 'xinference==1.11.0.post1'
Docker
Pull the latest image or update using pip in container
๐ New Model Support
Qwen3-4B
Instruct / Thinking versions
MiniCPM-V 4.5
Vision Language Model
โจ New Features
OpenAI image edit API support
VLLM multi-model loading support (including Omni, image, video, audio models)
VLLM AWQ 8bit quantization support
CUDA 12.8 image upgraded VLLM to 0.10.2
๐ Improvements & Fixes
Fixed UI button issue when n_gpu_layers=-1
Fixed CI build and CUDA 12.8 Dockerfile issues
Synchronized multimodal model JSON (audio, image, video, LLM)
๐ข Enterprise Edition Updates
๐ Kubernetes Operator Preliminary Support
Automatic model replica scheduling and lifecycle management, providing unified interface for clustered inference
๐ Stability Enhancements
Fixed several known issues, overall operation is more stable and reliable