Change8

Migrating to vLLM v0.13.0

Version v0.13.0 introduces 7 breaking changes. This guide details how to update your code.

Released: 12/19/2025

7
Breaking Changes
5
Migration Steps
9
Affected Symbols

⚠️ Check Your Code

If you use any of these symbols, you need to read this guide:

AttentionConfigVLLM_ATTENTION_BACKENDPassConfigModelConfigembed_input_idsembed_multimodalselective_state_updatecompile_rangesencoding_format

Breaking Changes

Issue #1

PassConfig flags have been renamed per RFC #27995.

Issue #2

The environment variable VLLM_ATTENTION_BACKEND has been removed; use the --attention-backend CLI argument instead.

Issue #3

The -O.xx flag has been removed.

Issue #4

Deprecated plugin and compilation fields have been removed.

Issue #5

Deprecated task, seed, and Multi-Modal (MM) settings have been removed.

Issue #6

Removed embed_input_ids and embed_multimodal fallback mechanisms.

Issue #7

The tokenizer setter has been removed.

Migration Steps

  1. 1
    Update deployment scripts to replace VLLM_ATTENTION_BACKEND environment variable with --attention-backend CLI flag.
  2. 2
    Review and update PassConfig flag names in custom configurations to match RFC #27995.
  3. 3
    Replace usage of --convert reward with --convert embed.
  4. 4
    Remove any usage of the deprecated -O.xx optimization flags.
  5. 5
    Ensure tokenizer initialization does not rely on the removed tokenizer setter.

Release Summary

vLLM v0.13.0 introduces support for NVIDIA Blackwell Ultra and DeepSeek-V3.2, alongside a major performance overhaul for Whisper models. This release transitions attention configuration from environment variables to CLI arguments and includes significant core engine optimizations like Model Runner V2.

Need More Details?

View the full release notes and all changes for vLLM v0.13.0.

View Full Changelog