MCP Tool Evaluation

Evaluate Your MCP Tools with Confidence

A powerful Node.js package and GitHub Action for evaluating Model Context Protocol (MCP) tool implementations using LLM-based scoring.

typescript
1
2
3
4
5
6
7
8
9
10
import { grade } from "mcp-evals"; import { openai } from "@ai-sdk/openai"; // Evaluate your MCP tool const result = await grade( openai("gpt-4"), "What is the weather in New York?" ); console.log(JSON.parse(result));

Key Features

Everything you need to evaluate and improve your MCP tool implementations

LLM-Based Scoring

Leverage powerful language models to evaluate your MCP tools with nuanced understanding.

GitHub Action

Seamlessly integrate evaluations into your CI/CD pipeline with our GitHub Action.

Comprehensive Metrics

Get detailed scores on accuracy, completeness, relevance, clarity, and reasoning.

Custom Evaluations

Create tailored evaluation functions specific to your MCP tool requirements.

PR Integration

Automatically post evaluation results as comments on pull requests.

Detailed Feedback

Receive actionable insights with strengths and weaknesses highlighted.

Installation

Get started with mcpevals in minutes

As a Node.js Package

bash
1
npm install mcp-evals

As a GitHub Action

yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
name: Run MCP Evaluations on: pull_request: types: [opened, synchronize, reopened] jobs: evaluate: runs-on: ubuntu-latest permissions: contents: read pull-requests: write steps: - uses: actions/checkout@v4 - name: Setup Node.js uses: actions/setup-node@v4 with: node-version: '20' - name: Install dependencies run: npm install - name: Run MCP Evaluations uses: mclenhard/mcp-evals@v1.0.9 with: evals_path: 'src/evals/evals.ts' server_path: 'src/index.ts' openai_api_key: ${{ secrets.OPENAI_API_KEY }} model: 'gpt-4' # Optional, defaults to gpt-4

Usage

Simple integration for powerful evaluation

1. Create Your Evaluation File

Create a file (e.g., evals.ts) that exports your evaluation configuration:

typescript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import { EvalConfig } from 'mcp-evals'; import { openai } from "@ai-sdk/openai"; import { grade, EvalFunction} from "mcp-evals"; const weatherEval: EvalFunction = { name: 'Weather Tool Evaluation', description: 'Evaluates the accuracy and completeness of weather information retrieval', run: async () => { const result = await grade(openai("gpt-4"), "What is the weather in New York?"); return JSON.parse(result); } }; const config: EvalConfig = { model: openai("gpt-4"), evals: [weatherEval] }; export default config; export const evals = [ weatherEval, // add other evals here ];

2. Run the Evaluations

As a Node.js Package

You can run the evaluations using the CLI:

bash
1
npx mcp-eval path/to/your/evals.ts path/to/your/server.ts

As a GitHub Action

The action will automatically:

  • Run your evaluations
  • Post the results as a comment on the PR
  • Update the comment if the PR is updated

Evaluation Results

Comprehensive scoring to improve your MCP tools

Sample Evaluation Result
AccuracyHow accurate the tool responses are
4.5/5
CompletenessHow complete the information provided is
4.2/5
RelevanceHow relevant the response is to the query
4.8/5
ClarityHow clear and understandable the response is
4/5
ReasoningHow well the tool demonstrates logical reasoning
4.3/5

Result Structure

typescript
1
2
3
4
5
6
7
8
interface EvalResult { accuracy: number; // Score from 1-5 completeness: number; // Score from 1-5 relevance: number; // Score from 1-5 clarity: number; // Score from 1-5 reasoning: number; // Score from 1-5 overall_comments: string; // Summary of strengths and weaknesses }

Latest from Our Blog

Insights, tutorials, and updates about MCP evaluations

Learn how our new Node.js package and GitHub Action can help you evaluate and improve your Model Context Protocol tool implementations.

April 25, 2024
Announcement
MCP
Evaluation

A guide to understanding google's A2A (Agent-to-Agent) protocol.

April 20, 2024
Best Practices
Tutorials

A mathematical approach to understanding the difference between tools and agents.

April 15, 2024
Metrics
Documentation

Ready to Evaluate Your MCP Tools?

Start improving your Model Context Protocol implementations today