Highlights Community Edition Enterprise Edition Official Website
πŸš€

Xinference v1.17.0

Release Notes

⚠️

Important Notice

v1.17.0 is the last release of the Xinference v1 series.

βœ… Key Highlights

Discover the major improvements and new features in this release

🧩

MThreads GPU (MUSA) Support

Added native support for domestic MThreads GPUs, further improving multi-hardware ecosystem compatibility and bringing more flexibility to your AI deployments.

πŸ–ΌοΈ

Multi-modal Engine Upgrade

  • β€’ OCR: Added Apple MLX engine support
  • β€’ Image Models: Now support multi-engine switching
  • β€’ Video Models: Added GGUF quantization format support
πŸš€

vLLM Distributed & Enhancement

  • β€’ Fixed and improved multi-machine distributed inference for vLLM β‰₯ 0.11.0
  • β€’ Added RoPE Scaling and MTP (Multi-Token Prediction) parameter support
🧠

New Model Support

Qwen-Image-Edit-2511
Qwen-Image-2512

🌐 Community Edition Updates

Open-source enhancements and new features for everyone

πŸ“¦ Installation Methods

pip install:

pip install 'xinference==1.17.0'

Docker:

Pull the latest image or update via pip inside the container

πŸ†• New Model Support

  • Qwen-Image-Edit-2511
  • Qwen-Image-2512

✨ New Features

  • βœ“ Support for enable_thinking parameter
  • βœ“ Added MThreads GPU (MUSA) support
  • βœ“ vLLM β‰₯ 0.11.0 distributed model launch
  • βœ“ OCR multi-engine + MLX backend
  • βœ“ Image models multi-engine switching
  • βœ“ Video models GGUF quantization
  • βœ“ Sentence-Transformers rerank auto batch
  • βœ“ Added FP4 inference support
  • βœ“ Added MiniMax tool call support

πŸ›  Enhancements

  • βœ“ vLLM MTP & RoPE Scaling parameters
  • βœ“ Model metadata updates (DeepSeek, OCR, R1)

🐞 Bug Fixes

  • βœ“ Fixed vLLM embedding/rerank empty cache
  • βœ“ Fixed worker duplicate selection
  • βœ“ Fixed vLLM OCR model stop issue
  • βœ“ Fixed model download cancel issue

πŸ“š Documentation Updates

πŸ“„ Updated v1.16.0 release notes
🐳 Improved Docker documentation
πŸ”§ vLLM + Torch compatibility notes

🏒 Enterprise Edition Updates

Advanced features for production deployments at scale

☸️

Kubernetes Support

  • βœ“ Optimized deployment and scheduling in K8s environments
  • βœ“ Improved stability in multi-node scenarios
  • βœ“ Enhanced maintainability for multi-replica deployments
⚑

KV Cache Architecture

  • βœ“ Decentralized, engine-agnostic KV cache storage
  • βœ“ Cross-engine PD separation (Prefill/Decode)
  • βœ“ Foundation for heterogeneous inference collaboration

Ready to Upgrade?

Experience the latest features and improvements in Xinference v1.17.0