🚀 Xinference v1.13.0 Release Notes
Visit xinference.io

Key Highlights

🏪Xinference Model Hub Official Launch

Get the latest models through model.xinference.io without waiting for Xinference releases to update models.

Documentation: Model Update Guide

Embedding Auto-Batch Support

Multiple concurrent embedding requests are automatically batched, significantly improving throughput:

  • Parallel requests automatically aggregated into efficient batches
  • Transparent to applications - no code changes required
  • Average response time reduced by up to 10x

🌐Community Edition Updates

Installation

pip install:

pip install 'xinference==1.13.0'

Docker:

Pull latest image or update with pip inside container

🆕New Model Support

  • Qwen3-VL-MLX (Multimodal support with MLX framework)

New Features

  • Auto-batch embedding
  • Update models from Xinference Model Hub
  • Support for updating model JSON metadata

🛠Enhancements

  • IndexTTS2 streaming output support
  • IndexTTS2 offline deployment support
  • Added embedding benchmark
  • Fixed CI build issues caused by PEFT version

🐞Bug Fixes

  • Fixed DeepSeek-OCR runtime exceptions in Docker
  • Tool call ID uses UUID to avoid duplicates
  • Fixed audio/video/image model cache list display issues

📚Documentation Updates

New model documentation updates
v1.12.0 installation guide for uv
Model online update mechanism documentation

🏢Enterprise Edition Updates

Added MinerU 2.5 Support

More powerful PDF/document parsing capabilities

Added paddleocr-vl Support

OCR + visual understanding integrated model, suitable for more business scenarios

System Stability Enhancement

Continuously fixed multiple issues, improving reliability for large-scale cluster operations