Change8

Migrating to vLLM v0.11.0

Version v0.11.0 introduces 4 breaking changes. This guide details how to update your code.

Released: 10/2/2025

4
Breaking Changes
5
Migration Steps
10
Affected Symbols

⚠️ Check Your Code

If you use any of these symbols, you need to read this guide:

AsyncLLMEngineLLMEngineMQLLMEnginexm.mark_stepMultiModalPlaceholderMapLLM.apply_modelFlashInferDeepGEMMAsyncOutputProcessorSampler

Breaking Changes

Issue #1

Complete removal of the V0 engine. V1 is now the only engine in the codebase. Code relying on V0-specific components like AsyncLLMEngine or MQLLMEngine will fail.

Issue #2

CUDA graph mode default changed to FULL_AND_PIECEWISE. While generally better, it may impact models that only support PIECEWISE mode.

Issue #3

C++17 is now globally enforced for builds.

Issue #4

Removal of Tokenizer group and various V0-specific model runner and executor classes.

Migration Steps

  1. 1
    Upgrade codebase to use V1 engine interfaces as V0 (AsyncLLMEngine, LLMEngine) has been removed.
  2. 2
    Ensure build environments support C++17.
  3. 3
    Update TPU code to use torch_xla.sync instead of xm.mark_step.
  4. 4
    Review CUDA graph settings if using models incompatible with FULL_AND_PIECEWISE mode.
  5. 5
    Note: Avoid using --async-scheduling in this version if preemption is required, as it may produce gibberish output.

Release Summary

This release marks the complete transition to the V1 engine, removing all V0 components while introducing support for DeepSeek-V3.2 and Qwen3 architectures. It features significant performance optimizations including KV cache CPU offloading, DeepGEMM by default, and Dual-Batch Overlap.

Need More Details?

View the full release notes and all changes for vLLM v0.11.0.

View Full Changelog