跳到主要内容

注册和版本控制评分器

评分器可以注册到 MLflow 实验中,以进行版本控制和团队协作。

支持的评分器

评分器类型支持
代理即评判者
基于模板的 LLM 评分器
基于代码的评分器
基于指南的 LLM 评分器❌ (请改用 MLflow Prompt Registry)
预定义评分器❌ (提示在 MLflow 中硬编码)

用法

先决条件

评判者会注册到 **MLflow 实验**(而不是运行级别)。

import mlflow

mlflow.set_tracking_uri("your-tracking-uri")
mlflow.create_experiment("evaluation-judges")

定义一个示例模板化 LLM 评分器

from mlflow.genai.judges import make_judge

quality_judge = make_judge(
name="response_quality",
instructions=("Evaluate if {{ outputs }} is high quality for {{ inputs }}."),
model="anthropic:/claude-opus-4-1-20250805",
)

注册评分器

要将评判者注册到实验,请调用评判者实例上的 register 方法。

# Register the judge
registered = quality_judge.register()
# You can pass experiment_id to register the judge to a specific experiment
# registered = quality_judge.register(experiment_id=experiment_id)

更新评分器

使用相同名称注册新评分器将创建新版本。

# Update and register a new version of the judge
quality_judge_v2 = make_judge(
name="response_quality", # Same name
instructions=(
"Evaluate if {{ outputs }} is high quality, accurate, and complete "
"for the question in {{ inputs }}."
),
model="anthropic:/claude-3.5-sonnet-20241022", # Updated model
)

# Register the updated judge
registered_v2 = quality_judge_v2.register(experiment_id=experiment_id)

加载评分器

要加载已注册的评分器,请使用 get_scorer 函数。

from mlflow.genai.scorers import get_scorer

# Get the latest version
latest_judge = get_scorer(name="response_quality")
# or specify experiment_id to get a scorer from a specific experiment
# latest_judge = get_scorer(name="response_quality", experiment_id=experiment_id)

列出评分器

list_scorers 函数会返回实验中已注册评分器的列表。

from mlflow.genai.scorers import list_scorers

all_scorers = list_scorers(experiment_id=experiment_id)
for scorer in all_scorers:
print(f"Scorer: {scorer.name}, Model: {scorer.model}")

UI 支持

敬请期待!