Xinference v1.11.0.post1

Release Notes ยท xinference.io

โœ… Key Highlights

๐Ÿง 

New Model Support

  • โ€ข Qwen3-4B Instruct / Thinking
  • โ€ข MiniCPM-V 4.5
โš™๏ธ

VLLM Engine Enhancements

  • โ€ข Multi-model loading support
  • โ€ข AWQ 8bit quantization support
  • โ€ข VLLM upgraded to 0.10.2 in CUDA 12.8 image
๐Ÿ–ผ๏ธ

OpenAI Image Edit API Support

Direct compatibility with images/edits interface, enhancing compatibility with image editing and generation models.

๐ŸŒ Community Edition Updates

๐Ÿ“ฆ Installation

pip pip install 'xinference==1.11.0.post1'
Docker Pull the latest image or update using pip in container

๐Ÿ†• New Model Support

Qwen3-4B

Instruct / Thinking versions

MiniCPM-V 4.5

Vision Language Model

โœจ New Features

OpenAI image edit API support

VLLM multi-model loading support (including Omni, image, video, audio models)

VLLM AWQ 8bit quantization support

CUDA 12.8 image upgraded VLLM to 0.10.2

๐Ÿ›  Improvements & Fixes

Fixed UI button issue when n_gpu_layers=-1

Fixed CI build and CUDA 12.8 Dockerfile issues

Synchronized multimodal model JSON (audio, image, video, LLM)

๐Ÿข Enterprise Edition Updates

๐Ÿš€ Kubernetes Operator Preliminary Support

Automatic model replica scheduling and lifecycle management, providing unified interface for clustered inference

๐Ÿ”’ Stability Enhancements

Fixed several known issues, overall operation is more stable and reliable