RAGAS
AI & LLMsSupercharge Your LLM Application Evaluations 🚀
Release History
v0.4.2Breaking9 fixes13 featuresThis release focuses heavily on migrating core metrics to the new collections API structure and introduces caching support for metrics and embeddings. Several bug fixes address issues related to instructor modes, type validation, and Claude workflow tokens.
v0.4.1Breaking2 fixes6 featuresThis release focuses heavily on migrating core metrics (ToolCallAccuracy, ToolCallF1, TopicAdherence, AgentGoalAccuracy, Rubrics) to utilize the collections API for better structure. It also introduces a breaking change by renaming `embed_text` to `aembed_text` in AnswerRelevancy.
v0.4.0Breaking9 fixes5 featuresThis release introduces major architectural updates, migrating numerous metrics to a modular BasePrompt system and enhancing LLM provider support via instructor.from_provider and dual adapter capabilities. It also includes several bug fixes related to LangChain integration and LLM detection.
v0.3.9Breaking5 fixes9 featuresThis release focuses heavily on migrating core metrics to a new structure, removing deprecated metrics like 'aspect critic', and introducing new features like synthetic data traceability metadata. Several documentation fixes and minor bug fixes related to OpenAI models were also implemented.
v0.3.8Breaking5 fixes6 featuresThis release focuses heavily on internal refactoring, migrating core functionalities like semantic similarity and simple criteria to collections, and merging LLM factory methods. Several bugs related to async handling and specific synthesizers were also fixed.
v0.3.74 fixes4 featuresThis release focuses on migrating several core metrics (BLEU, string metrics, answer similarity) to collections, improving robustness in query distribution, and adding new configuration options for LLM wrappers. Internal code quality and documentation were also enhanced.
v0.3.615 fixes10 featuresThis release introduces several new features, including CHRF score support, enhanced input flexibility for metrics, and OCI Gen AI integration. Numerous bug fixes address issues related to asyncio, metric calculations, and dependency compatibility.
v0.3.53 fixes4 featuresThis release focuses on improving core functionality, including better async execution and knowledge graph optimization, alongside several bug fixes and documentation updates.
v0.3.5rc2No release notes provided.
v0.3.5rc12 fixes4 featuresThis release focuses on improving asynchronous operations, optimizing knowledge graph handling for large datasets, and fixing a TypeError in metric calculations. It also introduces telemetry collection.
v0.3.42 fixes1 featureThis release focuses on performance improvements, documentation updates, and minor bug fixes, including optimizing cluster finding and fixing batching issues with LangChain.
v0.3.3Breaking19 fixes11 featuresThis release focuses heavily on internal restructuring, moving modules like `tracing`, `prompts`, `dataset`, and experimental features into the main package structure while retiring the `ragas.experimental` namespace. Numerous bug fixes address CI, LLM compatibility (especially OpenAI O1 series), and metric stability.
v0.3.3rc1Breaking20 fixes11 featuresThis release focuses heavily on internal restructuring, migrating modules like `tracing`, `prompts`, `dataset`, and experimental metrics out of experimental namespaces and into the main package structure. It also includes numerous bug fixes, performance optimizations (like 50% speedup for factual correctness), and improved LLM compatibility.
v0.3.2Breaking3 fixes3 featuresThis release moves key features like `experiment` and the CLI from experimental to the main package, adds prompt saving/loading capabilities, and removes the simulation feature.
v0.3.2rc3No release notes provided.
v0.3.2-rc21 fixThis release (v0.3.2-rc2) primarily addresses fixes related to pypi requirements and image absolute paths.
v0.3.2-rc1Breaking2 fixes4 featuresThis release moves key features like `experiment` and the CLI from experimental to the main package, removes simulation functionality, and adds support for Python 3.13.
v0.3.14 fixes1 featureThis release introduces a new Google Drive backend for dataset storage and includes several documentation and example improvements, alongside minor configuration fixes.
v0.3.0Breaking6 fixes10 featuresThis release introduces major features like LlamaIndex agentic integration, a new CLI, and security enhancements including a fix for CVE-2025-45691. It also includes significant internal refactoring, notably the removal of the Project structure.
v0.3.0-rc2No release notes provided.
v0.3.0-rc1No release notes provided.
v0.2.151 fix4 featuresThis release introduces new integrations with AWS Bedrock, LlamaStack, and Griptape, alongside enhancements to validation logic and documentation updates. A key documentation change involves renaming AWS Bedrock references to Amazon Bedrock.
v0.2.148 fixes6 featuresThis release introduces new features like HTTP request-response logging and multi-turn conversation evaluation, alongside numerous bug fixes across various metrics and synthesizers. It also includes documentation updates and new integrations.
v0.2.13Breaking3 fixes2 featuresThis release focuses on bug fixes, prompt improvements, and enhancements to integrations like langgraph, alongside removing an unnecessary argument from ToolCallAccuracy initialization.
v0.2.123 fixes2 featuresThis release introduces Bedrock token parser support and an optional parameter for the BLEU score, alongside several bug fixes for TP/FP calculations and the output parser.
v0.2.115 fixes6 featuresThis release introduces new features like Swarm integration and the ability to specify an experiment name during evaluation. It also includes several bug fixes related to metrics and dependency management, alongside numerous documentation updates.