Migrating to vLLM v0.11.1
Version v0.11.1 introduces 4 breaking changes. This guide details how to update your code.
Released: 11/18/2025
⚠️ Check Your Code
If you use any of these symbols, you need to read this guide:
vllm.workerVllmConfigMotifForCausalLMget_input_embeddings_v0Phi4FlashForCausalLMNixlConnectortorch.compileMiDashengLMGLM4 MoE Reasoning ParserBreaking Changes
●Issue #1
Removed vllm.worker module; update imports to use the new internal structure.
●Issue #2
Removed MotifForCausalLM model support.
●Issue #3
Consolidated speculative decode method names for MTP, which may affect custom implementations relying on old naming conventions.
●Issue #4
Removed V0 conditions for multimodal embeddings merging, requiring migration to V1 logic.
Migration Steps
- 1Update torch to 2.9.0 and CUDA to 12.9.1 to use the new default build features.
- 2Replace imports from vllm.worker with the updated core worker locations.
- 3Update speculative decoding implementations to use consolidated MTP method names.
- 4If using Anthropic clients, point them to the new /v1/messages endpoint on vllm serve.
- 5Update custom model loaders to use VllmConfig from config/vllm.py instead of config/__init__.py.
Release Summary
This release updates vLLM to PyTorch 2.9.0 and CUDA 12.9.1, introduces Anthropic API compatibility, and significantly improves the stability of async scheduling and torch.compile integration.
Need More Details?
View the full release notes and all changes for vLLM v0.11.1.
View Full Changelog