Vitest - Braintrust

Vitest is a test runner for JavaScript and TypeScript. Braintrust supports two Vitest workflows:

Use the Braintrust wrapVitest helper to write Vitest tests that run as Braintrust evals.
Use the vitest-evals reporter to report vitest-evals test runs to Braintrust.

Setup

Install Braintrust alongside Vitest:

npm install braintrust vitest

Set your Braintrust API key as an environment variable:

export BRAINTRUST_API_KEY=<your-api-key>

Separate evals from unit tests

Eval files are regular Vitest files and can live anywhere in your project. Evals can run slower and log results to Braintrust, so a common convention is a .eval.ts suffix or a dedicated evals/ directory with a separate Vitest config:

vitest.eval.config.ts

import { defineConfig } from "vitest/config";

export default defineConfig({
  test: {
    include: ["**/*.eval.ts"],
    testTimeout: 30000,
  },
});

Run evals separately from unit tests:

# Unit tests
npx vitest run

# Evals
npx vitest run --config vitest.eval.config.ts

Run evals with `wrapVitest`

Call wrapVitest once at the top of your test file, passing in the Vitest globals. Use the returned object in place of the standard test, describe, and expect.

my-eval.eval.ts

import * as vitest from "vitest";
import { wrapVitest } from "braintrust";

const { test, expect, describe } = wrapVitest(vitest, {
  projectName: "my-project", // Replace with your project name
});

describe("My eval suite", () => {
  test(
    "basic check",
    {
      input: { prompt: "What is 1 + 1?" },
      expected: "2",
    },
    async ({ input, expected }) => {
      const output = await myModel(input.prompt);
      expect(output).toBe(expected);
      return output;
    },
  );
});

Run it with the eval config:

npx vitest run --config vitest.eval.config.ts

After the suite finishes, Braintrust prints a summary to your terminal and creates an experiment with one traced span per test case.

Report `vitest-evals` runs to Braintrust

Use the Braintrust Vitest evals reporter when you already write evaluations with the vitest-evals package and want those runs logged to Braintrust. This workflow is separate from the standard Braintrust Eval() framework and from the wrapVitest helper. Install the reporter dependencies:

npm install braintrust vitest vitest-evals

Configure Vitest with both the vitest-evals reporter and the Braintrust reporter:

vitest.evals.config.mts

import { defineConfig } from "vitest/config";
import BraintrustVitestEvalsReporter from "braintrust/vitest-evals-reporter";

export default defineConfig({
  test: {
    include: ["**/*.eval.ts"],
    reporters: [
      "default",
      "vitest-evals/reporter",
      new BraintrustVitestEvalsReporter({
        projectName: "refund-agent", // Replace with your project name
        experimentName: `vitest-evals-${new Date().toISOString()}`,
      }),
    ],
    testTimeout: 30000,
  },
});

Write eval tests with vitest-evals primitives. The Braintrust reporter reads the eval metadata produced by vitest-evals/reporter and logs each eval case as a Braintrust span.

refund.eval.ts

import { expect } from "vitest";
import { createHarness, createJudge, describeEval } from "vitest-evals";

type RefundOutput = {
  message: string;
  status: "approved" | "denied";
};

const refundHarness = createHarness<string, RefundOutput>({
  name: "refund-harness",
  run: async ({ input }) => ({
    output: {
      message: "Invoice inv_123 is refundable and the refund is approved.",
      status: "approved",
    },
    events: [
      { type: "message", role: "user", content: input },
      {
        type: "tool_call",
        id: "call_lookup",
        name: "lookupInvoice",
        arguments: { invoiceId: "inv_123" },
      },
      {
        type: "tool_result",
        toolCallId: "call_lookup",
        name: "lookupInvoice",
        content: { refundable: true },
      },
      {
        type: "message",
        role: "assistant",
        content: "Invoice inv_123 is refundable and the refund is approved.",
      },
    ],
    usage: {
      inputTokens: 11,
      outputTokens: 13,
      totalTokens: 24,
      toolCalls: 1,
    },
  }),
});

const StatusJudge = createJudge<
  string,
  RefundOutput,
  { expectedStatus: RefundOutput["status"] }
>("StatusJudge", async ({ output, expectedStatus }) => ({
  metadata: {
    expectedStatus,
    observedStatus: output.status,
  },
  score: output.status === expectedStatus ? 1 : 0,
}));

describeEval("refund agent", { harness: refundHarness }, (it) => {
  it("approves refundable invoice", async ({ run }) => {
    const result = await run("Refund invoice inv_123");

    expect(result.output.status).toBe("approved");
    await expect(result).toSatisfyJudge(StatusJudge, {
      expectedStatus: "approved",
      threshold: 1,
    });
  });
});

Run Vitest with the reporter config:

npx vitest run --config vitest.evals.config.mts

The reporter creates or reuses a Braintrust experiment for the run. Each eval test logs:

The test input, output, status, file path, and full test name.
Scores from judges and assertions, including avg_score and pass when provided by vitest-evals.
Harness metadata, session messages, tool calls, artifacts, usage metrics, and errors.
Nested model, tool, and trace spans when the harness includes normalized trace data.

Reporter options

Pass options to new BraintrustVitestEvalsReporter() to control where results are logged:

Option	Description
`projectName`	Braintrust project name. Required unless `projectId` is set.
`projectId`	Braintrust project ID. Required unless `projectName` is set.
`experimentName`	Experiment name. Defaults to a timestamped `vitest-evals-*` name.
`displaySummary`	Whether to print the Braintrust experiment summary after the run.
`metadata`	Experiment-level metadata.
`tags`	Experiment-level tags.
`baseExperiment`	Base experiment name for comparisons.
`baseExperimentId`	Base experiment ID for comparisons.

Key concepts

`wrapVitest`

wrapVitest wraps Vitest’s test, describe, and expect with Braintrust tracking.

import * as vitest from "vitest";
import { wrapVitest } from "braintrust";

const { test, expect, describe } = wrapVitest(vitest, {
  projectName: "my-project",
  displaySummary: true,
});

Each describe creates one Braintrust experiment. Braintrust appends a timestamp to make each run unique. The project groups experiments together and defaults to the suite name if projectName is not set.

Test configuration

test accepts an optional config object between the name and the test function:

test(
  "test name",
  {
    input: { prompt: "Hello" },
    expected: "Hello!",
    metadata: { category: "greeting" },
    tags: ["smoke"],
    scorers: [myScorer],
    data: [{ input: "Hello", expected: "Hello!" }],
  },
  async ({ input, expected, metadata }) => {
    return myFunction(input);
  },
);

Scorers

A scorer receives { output, expected, input, metadata } and returns a name and score:

const exactMatch = ({ output, expected }: { output: unknown; expected: unknown }) => ({
  name: "exact_match",
  score: output === expected ? 1 : 0,
});

Scorers run after each test, including failed tests. Errors inside scorers are caught and logged.

import { Factuality, Levenshtein } from "autoevals";

test("quality", { scorers: [Factuality, Levenshtein] }, async ({ input }) => {
  return myModel(input.prompt);
});

Logging helpers

Use logOutputs and logFeedback inside a wrapVitest test to log additional data to the current span:

logOutputs({ summary, tokens_used: 412 });

Inline data and dataset support

Define data inline:

test(
  "sentiment",
  {
    data: [
      { input: "great product!", expected: "positive" },
      { input: "terrible experience", expected: "negative" },
    ],
    scorers: [
      ({ output, expected }) => ({
        name: "accuracy",
        score: output === expected ? 1 : 0,
      }),
    ],
  },
  async ({ input }) => classifySentiment(input),
);

Or load from a managed Braintrust dataset:

import { initDataset } from "braintrust";

const data = await initDataset({
  project: "my-project",
  dataset: "my-dataset",
}).fetchedData();

test("eval", { data, scorers: [Factuality] }, async ({ input }) => {
  return myModel(input.prompt);
});

Both approaches expand into separate test cases and Braintrust spans automatically.

​Setup

​Separate evals from unit tests

​Run evals with wrapVitest

​Report vitest-evals runs to Braintrust

​Reporter options

​Key concepts

​wrapVitest

​Test configuration

​Scorers

​Logging helpers

​Inline data and dataset support

​Resources