注册和版本化评分器
评分器可以注册到 MLflow 实验中,以进行版本控制和团队协作。
支持的评分器
| 评分器类型 | 支持 |
|---|---|
| 代理即评判者 | ✅ |
| 基于模板的大型语言模型 (LLM) 评分器 | ✅ |
| 基于代码的评分器 | ✅ |
| 基于指南的大型语言模型 (LLM) 评分器 | ❌ (请改用 MLflow Prompt Registry) |
| 预定义评分器 | ❌ (Prompt 在 MLflow 中硬编码) |
用法
先决条件
评判器已注册到MLflow 实验 (而非 Run 级别)。
python
import mlflow
mlflow.set_tracking_uri("your-tracking-uri")
mlflow.create_experiment("evaluation-judges")
定义一个基于模板的大型语言模型 (LLM) 评分器示例
python
from mlflow.genai.judges import make_judge
quality_judge = make_judge(
name="response_quality",
instructions=("Evaluate if {{ outputs }} is high quality for {{ inputs }}."),
model="anthropic:/claude-opus-4-1-20250805",
feedback_value_type=str,
)
注册评分器
要将评判器注册到实验,请调用评判器实例上的 register 方法。
python
# Register the judge
registered = quality_judge.register()
# You can pass experiment_id to register the judge to a specific experiment
# registered = quality_judge.register(experiment_id=experiment_id)
更新评分器
使用相同名称注册新评分器将创建一个新版本。
python
# Update and register a new version of the judge
quality_judge_v2 = make_judge(
name="response_quality", # Same name
instructions=(
"Evaluate if {{ outputs }} is high quality, accurate, and complete "
"for the question in {{ inputs }}."
),
model="anthropic:/claude-3.5-sonnet-20241022", # Updated model
feedback_value_type=str,
)
# Register the updated judge
registered_v2 = quality_judge_v2.register(experiment_id=experiment_id)
加载评分器
要加载已注册的评分器,请使用 get_scorer 函数。
python
from mlflow.genai.scorers import get_scorer
# Get the latest version
latest_judge = get_scorer(name="response_quality")
# or specify experiment_id to get a scorer from a specific experiment
# latest_judge = get_scorer(name="response_quality", experiment_id=experiment_id)
列出评分器
list_scorers 函数返回实验中已注册评分器的列表。
python
from mlflow.genai.scorers import list_scorers
all_scorers = list_scorers(experiment_id=experiment_id)
for scorer in all_scorers:
print(f"Scorer: {scorer.name}, Model: {scorer.model}")
UI 支持
即将推出!