BitsAndBytes
AI & LLMsAccessible large language models via k-bit quantization for PyTorch.
Release History
continuous-release_main1 featureThis pre-release provides the latest development wheels for all supported platforms, automatically rebuilt upon commits to the main branch. Installation requires using specific wheel URLs based on the target operating system and architecture.
0.49.11 fixThis patch release updates AMD targets and adds a safety guard for the quantization state attribute.
0.49.0Breaking4 fixes5 featuresThis release brings significant performance boosts for x86-64 CPUs, introduces experimental ROCm support via PyPI wheels, and adds compatibility for macOS 14+. Support for Python 3.9 and Maxwell GPUs has been dropped.
0.48.22 fixes1 featureVersion 0.48.2 fixes critical bugs related to quantization indexing and CPU/disk offloading regressions, and introduces Windows build support for SYCL kernels on XPU.
0.48.12 fixesVersion 0.48.1 addresses a critical regression in LLM.int8() affecting inference with pre-quantized checkpoints and fixes an issue with 8bit parameter device movement.
0.48.0Breaking3 fixes9 featuresThis release introduces official support for Intel GPUs and Intel Gaudi accelerators, alongside significant performance improvements for CUDA 4bit dequantization kernels and compatibility updates for PyTorch and CUDA versions. Support for PyTorch 2.2 and Maxwell GPUs has been dropped.
0.47.08 fixes9 featuresThis release introduces FSDP2 compatibility for Params4bit and significantly expands hardware support by improving CPU/XPU coverage and adding Volta support to recent CUDA builds. Several bugs related to 4bit quantization and documentation have also been resolved.
0.46.12 fixes1 featureThis release focuses on improving compatibility with torch.compile for Params4bit and fixing documentation issues, alongside adding support for CUDA 12.9 builds. It also streamlines the build process by automatically calling CMake during PEP 517 builds.
0.46.0Breaking8 fixes6 featuresThis release introduces significant improvements for `torch.compile` compatibility with both LLM.int8() and 4bit quantization, alongside a major refactoring to integrate with PyTorch Custom Operators. Support for Python 3.8 and older PyTorch versions has been dropped.
continuous-release_multi-backend-refactorNo release notes provided.
0.45.51 fix1 featureThis minor release fixes an issue where the CPU build of bitsandbytes was omitted from the v0.45.4 wheels by including it in the v0.45.5 release.
0.45.41 fixThis minor release focuses on improving CPU-only usage of bitsandbytes, featuring a bug fix and better system compatibility on Linux by adjusting the build environment.
0.45.34 fixes1 featureThis patch release introduces support for NVIDIA Blackwell GPUs via a new CUDA 12.8 build and includes several minor bug fixes.
0.45.21 fixThis patch release resolves a RuntimeError that occurred during bitsandbytes import when no GPUs were present alongside Triton in PyTorch 2.6 environments.
0.45.1Breaking2 fixes2 featuresThis patch release focuses on dependency compatibility, notably setting the minimum PyTorch version to 2.0.0 and ensuring compatibility with triton>=3.2.0. It also includes build system updates and packaging cleanup.