n8nworkflows.io

Evaluate AI Agent Response Correctness with OpenAI and RAGAS Methodology

Download [24.9KB]

Nodes

+7

Categories

AI & Machine Learning

Tags

Created by

Last edited 208 days ago

This n8n template demonstrates how to calculate the evaluation metric "Correctness" which in this scenario, measures the compares and classifies the agent's response against a set of ground truths.

The scoring approach is adapted from the open-source evaluations project RAGAS and you can see the source here https://github.com/explodinggradients/ragas/blob/main/ragas/src/ragas/metrics/_answer_correctness.py

How it works

This evaluation works best where the agent's response is allowed to be more verbose and conversational.
For our scoring, we classify the agent's response into 3 buckets: True Positive (in answer and ground truth), False Positive (in answer but not ground truth) and False Negative (not in answer but in ground truth).
We also calculate an average similarity score on the agent's response against all ground truths.
The classification and the similarity score is then averaged to give the final score.
A high score indicates the agent is accurate whereas a low score could indicate the agent has incorrect training data or is not providing a comprehensive enough answer.

Requirements

n8n version 1.94+
Check out this Google Sheet for a sample data https://docs.google.com/spreadsheets/d/1YOnu2JJjlxd787AuYcg-wKbkjyjyZFgASYVV0jsij5Y/edit?usp=sharing

You may also like

Evaluate AI Agent Response Relevance using OpenAI and Cosine Similarity

Evaluate AI Agent Response Relevance using OpenAI and Cosine Similarity

Evaluations Metric: Answer Similarity

Evaluations Metric: Answer Similarity

Evaluate RAG Response Accuracy with OpenAI: Document Groundedness Metric

Evaluate RAG Response Accuracy with OpenAI: Document Groundedness Metric

New to n8n?

Need help building new n8n workflows? Process automation for you or your company will save you time and money, and it's completely free!

Trending

Generate AI viral videos with NanoBanana & VEO3, shared on socials via Blotato

Generate AI viral videos with NanoBanana & VEO3, shared on socials via Blotato

Generate & Publish Professional Video Ads with Veo 3, Gemini & Creatomate

Generate & Publish Professional Video Ads with Veo 3, Gemini & Creatomate

Build a Multichannel Customer Support AI Assistant with Chatwoot & OpenRouter

Build a Multichannel Customer Support AI Assistant with Chatwoot & OpenRouter

zrGeorge Zargaryan