Migrating to vLLM v0.11.0
Version v0.11.0 introduces 4 breaking changes. This guide details how to update your code.
Released: 10/2/2025
⚠️ Check Your Code
If you use any of these symbols, you need to read this guide:
AsyncLLMEngineLLMEngineMQLLMEnginexm.mark_stepMultiModalPlaceholderMapLLM.apply_modelFlashInferDeepGEMMAsyncOutputProcessorSamplerBreaking Changes
●Issue #1
Complete removal of the V0 engine. V1 is now the only engine in the codebase. Code relying on V0-specific components like AsyncLLMEngine or MQLLMEngine will fail.
●Issue #2
CUDA graph mode default changed to FULL_AND_PIECEWISE. While generally better, it may impact models that only support PIECEWISE mode.
●Issue #3
C++17 is now globally enforced for builds.
●Issue #4
Removal of Tokenizer group and various V0-specific model runner and executor classes.
Migration Steps
- 1Upgrade codebase to use V1 engine interfaces as V0 (AsyncLLMEngine, LLMEngine) has been removed.
- 2Ensure build environments support C++17.
- 3Update TPU code to use torch_xla.sync instead of xm.mark_step.
- 4Review CUDA graph settings if using models incompatible with FULL_AND_PIECEWISE mode.
- 5Note: Avoid using --async-scheduling in this version if preemption is required, as it may produce gibberish output.
Release Summary
This release marks the complete transition to the V1 engine, removing all V0 components while introducing support for DeepSeek-V3.2 and Qwen3 architectures. It features significant performance optimizations including KV cache CPU offloading, DeepGEMM by default, and Dual-Batch Overlap.
Need More Details?
View the full release notes and all changes for vLLM v0.11.0.
View Full Changelog