注册和版本控制评分器
评分器可以注册到 MLflow 实验中,以进行版本控制和团队协作。
支持的评分器
评分器类型 | 支持 |
---|---|
代理即评判者 | ✅ |
基于模板的 LLM 评分器 | ✅ |
基于代码的评分器 | ✅ |
基于指南的 LLM 评分器 | ❌ (请改用 MLflow Prompt Registry) |
预定义评分器 | ❌ (提示在 MLflow 中硬编码) |
用法
先决条件
评判者会注册到 **MLflow 实验**(而不是运行级别)。
import mlflow
mlflow.set_tracking_uri("your-tracking-uri")
mlflow.create_experiment("evaluation-judges")
定义一个示例模板化 LLM 评分器
from mlflow.genai.judges import make_judge
quality_judge = make_judge(
name="response_quality",
instructions=("Evaluate if {{ outputs }} is high quality for {{ inputs }}."),
model="anthropic:/claude-opus-4-1-20250805",
)
注册评分器
要将评判者注册到实验,请调用评判者实例上的 register
方法。
# Register the judge
registered = quality_judge.register()
# You can pass experiment_id to register the judge to a specific experiment
# registered = quality_judge.register(experiment_id=experiment_id)
更新评分器
使用相同名称注册新评分器将创建新版本。
# Update and register a new version of the judge
quality_judge_v2 = make_judge(
name="response_quality", # Same name
instructions=(
"Evaluate if {{ outputs }} is high quality, accurate, and complete "
"for the question in {{ inputs }}."
),
model="anthropic:/claude-3.5-sonnet-20241022", # Updated model
)
# Register the updated judge
registered_v2 = quality_judge_v2.register(experiment_id=experiment_id)
加载评分器
要加载已注册的评分器,请使用 get_scorer
函数。
from mlflow.genai.scorers import get_scorer
# Get the latest version
latest_judge = get_scorer(name="response_quality")
# or specify experiment_id to get a scorer from a specific experiment
# latest_judge = get_scorer(name="response_quality", experiment_id=experiment_id)
列出评分器
list_scorers
函数会返回实验中已注册评分器的列表。
from mlflow.genai.scorers import list_scorers
all_scorers = list_scorers(experiment_id=experiment_id)
for scorer in all_scorers:
print(f"Scorer: {scorer.name}, Model: {scorer.model}")
UI 支持
敬请期待!