Skip to main content

Overview

Evaluations (Evals) help you validate that AI systems are accessing and retrieving your content correctly. Test queries, verify results, and ensure your data is being served as expected.

Why Evals Matter

When serving data to AI systems, you need confidence that:
  • Search and retrieval work correctly for your content
  • The right content surfaces for relevant queries
  • Access policies are working as intended
  • Your data is being used appropriately
Evals provide visibility and control over these interactions.

Evaluation Flow

What to evaluate at each stage:
  • Query Understanding - Does the system correctly interpret user intent?
  • Retrieval - Is the right content being found? Is it relevant and fresh?
  • Output - Does the final result meet quality standards?

The Journey

Phase 1: Validation

“Does this actually work for my content?” When you first connect your data, evals help you verify:
  • Test queries you know should work
  • See what’s happening under the hood
  • Validate that your content is discoverable
  • Understand how AI systems see your data
Evals as your proof and exploration tool

Phase 2: Quality Control

“I define what ‘good’ means for my data” As you gain confidence, evals become your quality control:
  • Define what successful retrieval looks like
  • Investigate when expected content doesn’t surface
  • Validate before major content updates or launches
  • Monitor ongoing quality of access
Evals as quality control you own

Phase 3: Optimization

“I can tune and improve retrieval” With established baselines, evals help you optimize:
  • Test configuration changes before applying them
  • Fine-tune how content is prioritized
  • Optimize for specific content types or topics
  • Continuously improve retrieval quality
Evals as your optimization tool

What You Can Test

Text Content Retrieval

Evaluate how text content is accessed:
  • Search query results
  • Content relevance
  • Response accuracy
  • Coverage of your content library

Access Validation

Verify access controls are working:
  • Authentication requirements
  • Policy enforcement
  • Permission boundaries
  • Rate limiting

Evaluation Types

Query Testing

Test specific queries against your content:
  • Does the query return expected results?
  • Is relevant content surfacing?
  • Are results ranked appropriately?

Coverage Analysis

Understand how much of your content is accessible:
  • Which content is being found?
  • Which content isn’t surfacing?
  • Identify gaps in discoverability

Performance Testing

Measure how your data is being served:
  • Response times
  • Success rates
  • Error patterns

Best Practices

Start Simple

Begin with basic queries you know should work

Test Regularly

Run evals before major changes or launches

Define Success

Establish what “good” means for your content

Track Changes

Monitor how eval results change over time

Building Trust

Evals help build trust in your data delivery:
  1. Transparency - See exactly how AI systems interact with your content
  2. Control - Verify that access works as intended
  3. Quality - Ensure your data is being served correctly
  4. Confidence - Know your content is discoverable and accessible

Next Steps