Evaluate Your MCP Tools with Confidence
A powerful Node.js package and GitHub Action for evaluating Model Context Protocol (MCP) tool implementations using LLM-based scoring.
12345678910import { grade } from "mcp-evals"; import { openai } from "@ai-sdk/openai"; // Evaluate your MCP tool const result = await grade( openai("gpt-4"), "What is the weather in New York?" ); console.log(JSON.parse(result));
Key Features
Everything you need to evaluate and improve your MCP tool implementations
Leverage powerful language models to evaluate your MCP tools with nuanced understanding.
Seamlessly integrate evaluations into your CI/CD pipeline with our GitHub Action.
Get detailed scores on accuracy, completeness, relevance, clarity, and reasoning.
Create tailored evaluation functions specific to your MCP tool requirements.
Automatically post evaluation results as comments on pull requests.
Receive actionable insights with strengths and weaknesses highlighted.
Installation
Get started with mcpevals in minutes
As a Node.js Package
1npm install mcp-evals
As a GitHub Action
12345678910111213141516171819202122232425262728name: Run MCP Evaluations on: pull_request: types: [opened, synchronize, reopened] jobs: evaluate: runs-on: ubuntu-latest permissions: contents: read pull-requests: write steps: - uses: actions/checkout@v4 - name: Setup Node.js uses: actions/setup-node@v4 with: node-version: '20' - name: Install dependencies run: npm install - name: Run MCP Evaluations uses: mclenhard/mcp-evals@v1.0.9 with: evals_path: 'src/evals/evals.ts' server_path: 'src/index.ts' openai_api_key: ${{ secrets.OPENAI_API_KEY }} model: 'gpt-4' # Optional, defaults to gpt-4
Usage
Simple integration for powerful evaluation
1. Create Your Evaluation File
Create a file (e.g., evals.ts) that exports your evaluation configuration:
123456789101112131415161718192021222324import { EvalConfig } from 'mcp-evals'; import { openai } from "@ai-sdk/openai"; import { grade, EvalFunction} from "mcp-evals"; const weatherEval: EvalFunction = { name: 'Weather Tool Evaluation', description: 'Evaluates the accuracy and completeness of weather information retrieval', run: async () => { const result = await grade(openai("gpt-4"), "What is the weather in New York?"); return JSON.parse(result); } }; const config: EvalConfig = { model: openai("gpt-4"), evals: [weatherEval] }; export default config; export const evals = [ weatherEval, // add other evals here ];
2. Run the Evaluations
As a Node.js Package
You can run the evaluations using the CLI:
1npx mcp-eval path/to/your/evals.ts path/to/your/server.ts
As a GitHub Action
The action will automatically:
- Run your evaluations
- Post the results as a comment on the PR
- Update the comment if the PR is updated
Evaluation Results
Comprehensive scoring to improve your MCP tools
Result Structure
12345678interface EvalResult { accuracy: number; // Score from 1-5 completeness: number; // Score from 1-5 relevance: number; // Score from 1-5 clarity: number; // Score from 1-5 reasoning: number; // Score from 1-5 overall_comments: string; // Summary of strengths and weaknesses }
Latest from Our Blog
Insights, tutorials, and updates about MCP evaluations
Learn how our new Node.js package and GitHub Action can help you evaluate and improve your Model Context Protocol tool implementations.
A guide to understanding google's A2A (Agent-to-Agent) protocol.
A mathematical approach to understanding the difference between tools and agents.