跳到主要内容

注册和版本化评分器

评分器可以注册到 MLflow 实验中,以进行版本控制和团队协作。

支持的评分器

评分器类型支持
代理即评判者
基于模板的大型语言模型 (LLM) 评分器
基于代码的评分器
基于指南的大型语言模型 (LLM) 评分器❌ (请改用 MLflow Prompt Registry)
预定义评分器❌ (Prompt 在 MLflow 中硬编码)

用法

先决条件

评判器已注册到MLflow 实验 (而非 Run 级别)。

python
import mlflow

mlflow.set_tracking_uri("your-tracking-uri")
mlflow.create_experiment("evaluation-judges")

定义一个基于模板的大型语言模型 (LLM) 评分器示例

python
from mlflow.genai.judges import make_judge

quality_judge = make_judge(
name="response_quality",
instructions=("Evaluate if {{ outputs }} is high quality for {{ inputs }}."),
model="anthropic:/claude-opus-4-1-20250805",
feedback_value_type=str,
)

注册评分器

要将评判器注册到实验,请调用评判器实例上的 register 方法。

python
# Register the judge
registered = quality_judge.register()
# You can pass experiment_id to register the judge to a specific experiment
# registered = quality_judge.register(experiment_id=experiment_id)

更新评分器

使用相同名称注册新评分器将创建一个新版本。

python
# Update and register a new version of the judge
quality_judge_v2 = make_judge(
name="response_quality", # Same name
instructions=(
"Evaluate if {{ outputs }} is high quality, accurate, and complete "
"for the question in {{ inputs }}."
),
model="anthropic:/claude-3.5-sonnet-20241022", # Updated model
feedback_value_type=str,
)

# Register the updated judge
registered_v2 = quality_judge_v2.register(experiment_id=experiment_id)

加载评分器

要加载已注册的评分器,请使用 get_scorer 函数。

python
from mlflow.genai.scorers import get_scorer

# Get the latest version
latest_judge = get_scorer(name="response_quality")
# or specify experiment_id to get a scorer from a specific experiment
# latest_judge = get_scorer(name="response_quality", experiment_id=experiment_id)

列出评分器

list_scorers 函数返回实验中已注册评分器的列表。

python
from mlflow.genai.scorers import list_scorers

all_scorers = list_scorers(experiment_id=experiment_id)
for scorer in all_scorers:
print(f"Scorer: {scorer.name}, Model: {scorer.model}")

UI 支持

即将推出!