Xinference v1.13.0 Release Notes

🚀 Xinference v1.13.0 Release Notes

Visit xinference.io

✅ Key Highlights

🏪Xinference Model Hub Official Launch

Get the latest models through model.xinference.io without waiting for Xinference releases to update models.

Documentation: Model Update Guide

⚡Embedding Auto-Batch Support

Multiple concurrent embedding requests are automatically batched, significantly improving throughput:

Parallel requests automatically aggregated into efficient batches
Transparent to applications - no code changes required
Average response time reduced by up to 10x

🌐Community Edition Updates

Installation

pip install:


                            pip install 'xinference==1.13.0'

Docker:

Pull latest image or update with pip inside container

🆕New Model Support

Qwen3-VL-MLX (Multimodal support with MLX framework)

✨New Features

Auto-batch embedding
Update models from Xinference Model Hub
Support for updating model JSON metadata

🛠Enhancements

IndexTTS2 streaming output support
IndexTTS2 offline deployment support
Added embedding benchmark
Fixed CI build issues caused by PEFT version

🐞Bug Fixes

Fixed DeepSeek-OCR runtime exceptions in Docker
Tool call ID uses UUID to avoid duplicates
Fixed audio/video/image model cache list display issues

📚Documentation Updates

New model documentation updates

v1.12.0 installation guide for uv

Model online update mechanism documentation

🏢Enterprise Edition Updates

Added MinerU 2.5 Support

More powerful PDF/document parsing capabilities

Added paddleocr-vl Support

OCR + visual understanding integrated model, suitable for more business scenarios

System Stability Enhancement

Continuously fixed multiple issues, improving reliability for large-scale cluster operations