MCP Tool Evaluation

Evaluate Your MCP Tools with Confidence

A powerful Node.js package and GitHub Action for evaluating Model Context Protocol (MCP) tool implementations using LLM-based scoring.

typescript

1
2
3
4
5
6
7
8
9
10
import { grade } from "mcp-evals";
import { openai } from "@ai-sdk/openai";

// Evaluate your MCP tool
const result = await grade(
  openai("gpt-4"), 
  "What is the weather in New York?"
);

console.log(JSON.parse(result));

Key Features

Everything you need to evaluate and improve your MCP tool implementations

LLM-Based Scoring

Leverage powerful language models to evaluate your MCP tools with nuanced understanding.

GitHub Action

Seamlessly integrate evaluations into your CI/CD pipeline with our GitHub Action.

Comprehensive Metrics

Get detailed scores on accuracy, completeness, relevance, clarity, and reasoning.

Custom Evaluations

Create tailored evaluation functions specific to your MCP tool requirements.

PR Integration

Automatically post evaluation results as comments on pull requests.

Detailed Feedback

Receive actionable insights with strengths and weaknesses highlighted.

Installation

Get started with mcpevals in minutes

As a Node.js Package

bash

1
npm install mcp-evals

As a GitHub Action

yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
name: Run MCP Evaluations
on:
  pull_request:
    types: [opened, synchronize, reopened]
jobs:
  evaluate:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'
          
      - name: Install dependencies
        run: npm install
        
      - name: Run MCP Evaluations
        uses: mclenhard/mcp-evals@v1.0.9
        with:
          evals_path: 'src/evals/evals.ts'
          server_path: 'src/index.ts'
          openai_api_key: ${{ secrets.OPENAI_API_KEY }}
          model: 'gpt-4'  # Optional, defaults to gpt-4

Usage

Simple integration for powerful evaluation

1. Create Your Evaluation File

Create a file (e.g., evals.ts) that exports your evaluation configuration:

typescript

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import { EvalConfig } from 'mcp-evals';
import { openai } from "@ai-sdk/openai";
import { grade, EvalFunction} from "mcp-evals";

const weatherEval: EvalFunction = {
  name: 'Weather Tool Evaluation',
  description: 'Evaluates the accuracy and completeness of weather information retrieval',
  run: async () => {
    const result = await grade(openai("gpt-4"), "What is the weather in New York?");
    return JSON.parse(result);
  }
};

const config: EvalConfig = {
  model: openai("gpt-4"),
  evals: [weatherEval]
};

export default config;

export const evals = [
  weatherEval,
  // add other evals here
];

2. Run the Evaluations

As a Node.js Package

You can run the evaluations using the CLI:

bash

1
npx mcp-eval path/to/your/evals.ts path/to/your/server.ts

As a GitHub Action

The action will automatically:

Run your evaluations
Post the results as a comment on the PR
Update the comment if the PR is updated

Evaluation Results

Comprehensive scoring to improve your MCP tools

Sample Evaluation Result

AccuracyHow accurate the tool responses are

4.5/5

CompletenessHow complete the information provided is

4.2/5

RelevanceHow relevant the response is to the query

4.8/5

ClarityHow clear and understandable the response is

4/5

ReasoningHow well the tool demonstrates logical reasoning

4.3/5

Result Structure

typescript

1
2
3
4
5
6
7
8
interface EvalResult {
  accuracy: number;        // Score from 1-5
  completeness: number;    // Score from 1-5
  relevance: number;       // Score from 1-5
  clarity: number;         // Score from 1-5
  reasoning: number;       // Score from 1-5
  overall_comments: string; // Summary of strengths and weaknesses
}

Latest from Our Blog

Insights, tutorials, and updates about MCP evaluations

Introducing MCP Evals: A New Way to Evaluate Your MCP Tools

Featured

Learn how our new Node.js package and GitHub Action can help you evaluate and improve your Model Context Protocol tool implementations.

April 25, 2024

Announcement

MCP

Evaluation

What is the A2A protocol?

A guide to understanding google's A2A (Agent-to-Agent) protocol.

April 20, 2024

Best Practices

Tutorials

Tools vs Agents: A Mathematical Approach

A mathematical approach to understanding the difference between tools and agents.

April 15, 2024

Metrics

Documentation

Ready to Evaluate Your MCP Tools?

Start improving your Model Context Protocol implementations today