Parallel Model Loading
- Multiple replicas support parallel loading
- UI displays independent loading progress
- View all replica statuses on instance page
Xinference v1.14.0 brings exciting new features including parallel model loading, enhanced virtual environments, and improved engine diagnostics.
Display specific reasons when engines are unavailable, making troubleshooting easier without guessing why an engine is unavailable
Rerank models now support llama.cpp backend for lightweight inference
Full compatibility with the latest vLLM version
High-quality image generation model with enhanced capabilities
pip install: pip install 'xinference==1.14.0'
Docker: Pull the latest image or update with pip in container
Multi-replica loading and faster inference scheduling for enhanced performance
Fixed multiple issues for more reliable enterprise cluster operations