跳到主要内容

Judge Alignment: Teaching AI to Match Human Preferences

将通用评判者转变为领域专家

评判者对齐是指优化 LLM 评判者以匹配人类评估标准的过程。通过系统地学习人类反馈,评判者可以从通用评估者演变为理解您独特质量标准的领域专家。

为什么对齐很重要

即使是最先进的 LLM 也需要进行校准,以匹配您的特定评估标准。“优质”的客户服务因行业而异。医疗准确性要求与一般健康建议不同。对齐通过示例弥合了这一差距,教会评判者您的特定质量标准。

从专家反馈中学习

评判者通过学习您领域专家的评估来改进,捕获通用提示遗漏的细微质量标准。

规模化的一致标准

一旦对齐,评判者将您的确切质量标准一致地应用于数百万次评估。

持续改进

随着您的标准演变,可以通过新的反馈重新对齐评判者,从而保持长期的相关性。

减少评估错误

与通用评估提示相比,对齐后的评判者在假阳性/假阴性方面降低了 30-50%。

评判者对齐工作原理

对齐生命周期

Continuous Refinement
创建初始评判者
收集人类反馈
运行对齐
验证准确性
监控与迭代

快速入门:对齐您的第一个评判者

关键对齐要求

要使对齐生效,每个跟踪(trace)必须同时包含**评判者评估**和**人类反馈**,并且**评估名称相同**。对齐过程通过比较相同跟踪上的评判者评估和人类反馈来学习。

评估名称必须与评判者名称完全匹配——如果您的评判者名为“product_quality”,那么评判者的评估和人类反馈都必须使用“product_quality”这个名称。

顺序无关紧要——人类可以在评判者评估之前或之后提供反馈。

注意:目前仅支持使用 {{ inputs }}{{ outputs }} 模板的基于字段的评估。对 Agent-as-a-Judge 评估({{ trace }})和期望({{ expectations }})的支持尚不可用。

第一步:设置并生成跟踪

首先,创建您的评判者并生成带有初始评估的跟踪。

from mlflow.genai.judges import make_judge
from mlflow.genai.judges.optimizers import SIMBAAlignmentOptimizer
from mlflow.entities import AssessmentSource, AssessmentSourceType
import mlflow

# Create experiment and initial judge
experiment_id = mlflow.create_experiment("product-quality-alignment")
mlflow.set_experiment(experiment_id=experiment_id)

initial_judge = make_judge(
name="product_quality",
instructions=(
"Evaluate if the product description in {{ outputs }} "
"is accurate and helpful for the query in {{ inputs }}. "
"Rate as: excellent, good, fair, or poor"
),
model="anthropic:/claude-opus-4-1-20250805",
)

# Generate traces from your application (minimum 10 required)
traces = []
for i in range(15): # Generate 15 traces (more than minimum of 10)
with mlflow.start_span(f"product_description_{i}") as span:
# Your application logic
query = f"Product query {i}"
description = f"Product description for query {i}"
span.set_inputs({"query": query})
span.set_outputs({"description": description})
traces.append(span.trace_id)

# Run the judge on these traces to get initial assessments
for trace_id in traces:
trace = mlflow.get_trace(trace_id)

# Extract inputs and outputs from the trace for field-based evaluation
inputs = trace.data.spans[0].inputs # Get inputs from trace
outputs = trace.data.spans[0].outputs # Get outputs from trace

# Judge evaluates using field-based approach (inputs/outputs)
judge_result = initial_judge(inputs=inputs, outputs=outputs)
# Judge's assessment is automatically logged when called

第二步:收集人类反馈

在对跟踪运行评判者后,您需要收集人类反馈。您可以选择:

  • 使用 MLflow UI(推荐):通过直观的界面审核跟踪并添加反馈。
  • 以编程方式记录:如果您已有地面真实标签。

有关收集反馈的详细说明,请参阅下方的收集用于对齐的反馈

第三步:对齐并注册

收集反馈后,对齐您的评判者并进行注册。

# Retrieve traces with both judge and human assessments
traces_for_alignment = mlflow.search_traces(
experiment_ids=[experiment_id], max_results=15, return_type="list"
)

# Align the judge using human corrections (minimum 10 traces recommended)
if len(traces_for_alignment) >= 10:
optimizer = SIMBAAlignmentOptimizer(model="anthropic:/claude-opus-4-1-20250805")

# Run alignment - shows minimal progress by default:
# INFO: Starting SIMBA optimization with 15 examples (set logging to DEBUG for detailed output)
# INFO: SIMBA optimization completed
aligned_judge = initial_judge.align(optimizer, traces_for_alignment)

# Register the aligned judge
aligned_judge.register(experiment_id=experiment_id)
print("Judge aligned successfully with human feedback")
else:
print(f"Need at least 10 traces for alignment, have {len(traces_for_alignment)}")

SIMBA 对齐优化器

MLflow 提供基于 DSPy 对 SIMBA(Simplified Multi-Bootstrap Aggregation,简化多引导聚合)的实现的**默认对齐优化器**。当您调用 align() 而不指定优化器时,将自动使用 SIMBA 优化器。

# Default: Uses SIMBA optimizer automatically
aligned_judge = initial_judge.align(traces_with_feedback)

# Explicit: Same as above but with custom model specification
from mlflow.genai.judges.optimizers import SIMBAAlignmentOptimizer

optimizer = SIMBAAlignmentOptimizer(
model="anthropic:/claude-opus-4-1-20250805" # Model used for optimization
)
aligned_judge = initial_judge.align(traces_with_feedback, optimizer)

# Requirements for alignment:
# - Minimum 10 traces with BOTH judge assessments and human feedback
# - Both assessments must use the same name (matching the judge name)
# - Order doesn't matter - humans can assess before or after judge
# - Mix of agreements and disagreements between judge and human recommended
默认优化器行为

当使用不带优化器参数的 align() 时,MLflow 会自动使用 SIMBA 优化器。这简化了对齐过程,同时仍然允许在需要时进行自定义。

控制优化输出

默认情况下,对齐仅显示最少的进度信息,以保持日志整洁。如果您需要调试优化过程或查看详细的迭代进度,请启用 DEBUG 日志记录。

import logging

# Enable detailed optimization output
logging.getLogger("mlflow.genai.judges.optimizers.simba").setLevel(logging.DEBUG)

# Now alignment will show:
# - Detailed iteration-by-iteration progress
# - Score improvements at each step
# - Strategy selection details
# - Full DSPy optimization output

aligned_judge = initial_judge.align(optimizer, traces_with_feedback)

# Reset to default (minimal output) after debugging
logging.getLogger("mlflow.genai.judges.optimizers.simba").setLevel(logging.INFO)
何时使用详细日志

启用 DEBUG 日志记录,当:

  • 优化似乎停滞不前或耗时过长
  • 您想了解优化器如何改进指令
  • 调试对齐失败或意外结果
  • 了解 SIMBA 优化器内部工作原理

对于生产使用,请将其保留为 INFO(默认),以避免冗余输出。

收集用于对齐的反馈

对齐的质量取决于反馈的质量和数量。选择最适合您情况的方法。

反馈收集方法

何时使用:您没有现有的地面真实标签,需要收集人类反馈。

MLflow UI 提供了一个直观的界面,用于审核跟踪和添加反馈。

  1. 导航到实验中的“Traces”选项卡
  2. 单击单个跟踪以查看输入、输出和任何现有评判者评估。
  3. 单击“Add Feedback”按钮添加反馈
  4. 选择与您的评判者名称匹配的评估名称(例如,“product_quality”)。
  5. 根据您的评估标准提供评分

有效的反馈收集技巧

  • 如果您**不是领域专家**:将跟踪分配给团队成员或领域专家进行审查。
  • 如果您**是领域专家**:创建评分表或指导文档以确保一致性。
  • 对于**多个审阅者**:组织反馈会议,让审阅者可以一起处理批次。
  • 为了**一致性**:在开始之前清晰地记录您的评估标准。

UI 会自动以正确的格式为对齐记录反馈。

MLflow UI Feedback Interface

多元化的审阅者

纳入多位专家的反馈,以捕捉不同的视角并减少个人偏见。

平衡的示例

包括积极和消极的示例。目标是至少各占 30%,以帮助评判者学习边界。

足够的数量

收集至少 10 个反馈示例(SIMBA 的最低要求),但通常 50-100 个示例会产生更好的结果。

一致的标准

确保审阅者使用一致的标准。提供指导方针或评分表以标准化评估。

自定义对齐优化器

MLflow 的对齐系统设计为**插件架构**,允许您创建自定义优化器以实现不同的对齐策略。这种可扩展性使您能够利用 MLflow 的评判者基础设施来实现特定领域的优化方法。

创建自定义优化器

要创建自定义对齐优化器,请扩展 AlignmentOptimizer 抽象基类。

from mlflow.genai.judges.base import AlignmentOptimizer, Judge
from mlflow.entities.trace import Trace


class MyCustomOptimizer(AlignmentOptimizer):
"""Custom optimizer implementation for judge alignment."""

def __init__(self, model: str = None, **kwargs):
"""Initialize your optimizer with custom parameters."""
self.model = model
# Add any custom initialization logic

def align(self, judge: Judge, traces: list[Trace]) -> Judge:
"""
Implement your alignment algorithm.

Args:
judge: The judge to be optimized
traces: List of traces containing human feedback

Returns:
A new Judge instance with improved alignment
"""
# Your custom alignment logic here
# 1. Extract feedback from traces
# 2. Analyze disagreements between judge and human
# 3. Generate improved instructions
# 4. Return new judge with better alignment

# Example: Return judge with modified instructions
from mlflow.genai.judges import make_judge

improved_instructions = self._optimize_instructions(judge.instructions, traces)

return make_judge(
name=judge.name,
instructions=improved_instructions,
model=judge.model,
)

def _optimize_instructions(self, instructions: str, traces: list[Trace]) -> str:
"""Your custom optimization logic."""
# Implement your optimization strategy
pass

使用自定义优化器

实现后,像使用内置优化器一样使用您的自定义优化器。

# Create your custom optimizer
custom_optimizer = MyCustomOptimizer(model="your-model")

# Use it for alignment
aligned_judge = initial_judge.align(traces_with_feedback, custom_optimizer)

可用优化器

MLflow 目前提供:

插件架构确保可以添加新的优化策略,而无需修改核心评判者系统,从而促进了可扩展性和不同对齐方法的实验。

测试对齐效果

验证对齐是否改进了您的评判者。

def test_alignment_improvement(
original_judge, aligned_judge, test_traces: list
) -> dict:
"""Compare judge performance before and after alignment."""

original_correct = 0
aligned_correct = 0

for trace in test_traces:
# Get human ground truth from trace assessments
feedbacks = trace.search_assessments(type="feedback")
human_feedback = next(
(f for f in feedbacks if f.source.source_type == "HUMAN"), None
)

if not human_feedback:
continue

# Get judge evaluations
original_eval = original_judge(trace=trace)
aligned_eval = aligned_judge(trace=trace)

# Check agreement with human
if original_eval.value == human_feedback.value:
original_correct += 1
if aligned_eval.value == human_feedback.value:
aligned_correct += 1

total = len(test_traces)
return {
"original_accuracy": original_correct / total,
"aligned_accuracy": aligned_correct / total,
"improvement": (aligned_correct - original_correct) / total,
}

后续步骤