Judge Alignment: Teaching AI to Match Human Preferences

将通用评判者转变为领域专家

评判者对齐是指优化 LLM 评判者以匹配人类评估标准的过程。通过系统地学习人类反馈，评判者可以从通用评估者演变为理解您独特质量标准的领域专家。

为什么对齐很重要

即使是最先进的 LLM 也需要进行校准，以匹配您的特定评估标准。“优质”的客户服务因行业而异。医疗准确性要求与一般健康建议不同。对齐通过示例弥合了这一差距，教会评判者您的特定质量标准。

从专家反馈中学习

评判者通过学习您领域专家的评估来改进，捕获通用提示遗漏的细微质量标准。

规模化的一致标准

一旦对齐，评判者将您的确切质量标准一致地应用于数百万次评估。

持续改进

随着您的标准演变，可以通过新的反馈重新对齐评判者，从而保持长期的相关性。

减少评估错误

与通用评估提示相比，对齐后的评判者在假阳性/假阴性方面降低了 30-50%。

评判者对齐工作原理

对齐生命周期

创建初始评判者

收集人类反馈

运行对齐

验证准确性

监控与迭代

快速入门：对齐您的第一个评判者

关键对齐要求

要使对齐生效，每个跟踪（trace）必须同时包含**评判者评估**和**人类反馈**，并且**评估名称相同**。对齐过程通过比较相同跟踪上的评判者评估和人类反馈来学习。

评估名称必须与评判者名称完全匹配——如果您的评判者名为“product_quality”，那么评判者的评估和人类反馈都必须使用“product_quality”这个名称。

顺序无关紧要——人类可以在评判者评估之前或之后提供反馈。

注意：目前仅支持使用 {{ inputs }} 和 {{ outputs }} 模板的基于字段的评估。对 Agent-as-a-Judge 评估（{{ trace }}）和期望（{{ expectations }}）的支持尚不可用。

第一步：设置并生成跟踪

首先，创建您的评判者并生成带有初始评估的跟踪。

from mlflow.genai.judges import make_judge
from mlflow.genai.judges.optimizers import SIMBAAlignmentOptimizer
from mlflow.entities import AssessmentSource, AssessmentSourceType
import mlflow

# Create experiment and initial judge
experiment_id = mlflow.create_experiment("product-quality-alignment")
mlflow.set_experiment(experiment_id=experiment_id)

initial_judge = make_judge(
    name="product_quality",
    instructions=(
        "Evaluate if the product description in {{ outputs }} "
        "is accurate and helpful for the query in {{ inputs }}. "
        "Rate as: excellent, good, fair, or poor"
    ),
    model="anthropic:/claude-opus-4-1-20250805",
)

# Generate traces from your application (minimum 10 required)
traces = []
for i in range(15):  # Generate 15 traces (more than minimum of 10)
    with mlflow.start_span(f"product_description_{i}") as span:
        # Your application logic
        query = f"Product query {i}"
        description = f"Product description for query {i}"
        span.set_inputs({"query": query})
        span.set_outputs({"description": description})
        traces.append(span.trace_id)

# Run the judge on these traces to get initial assessments
for trace_id in traces:
    trace = mlflow.get_trace(trace_id)

    # Extract inputs and outputs from the trace for field-based evaluation
    inputs = trace.data.spans[0].inputs  # Get inputs from trace
    outputs = trace.data.spans[0].outputs  # Get outputs from trace

    # Judge evaluates using field-based approach (inputs/outputs)
    judge_result = initial_judge(inputs=inputs, outputs=outputs)
    # Judge's assessment is automatically logged when called

第二步：收集人类反馈

在对跟踪运行评判者后，您需要收集人类反馈。您可以选择：

使用 MLflow UI（推荐）：通过直观的界面审核跟踪并添加反馈。
以编程方式记录：如果您已有地面真实标签。

有关收集反馈的详细说明，请参阅下方的收集用于对齐的反馈。

第三步：对齐并注册

收集反馈后，对齐您的评判者并进行注册。

默认优化器（推荐）
显式优化器

# Retrieve traces with both judge and human assessments
traces_for_alignment = mlflow.search_traces(
    experiment_ids=[experiment_id], max_results=15, return_type="list"
)

# Align the judge using human corrections (minimum 10 traces recommended)
if len(traces_for_alignment) >= 10:
    optimizer = SIMBAAlignmentOptimizer(model="anthropic:/claude-opus-4-1-20250805")

    # Run alignment - shows minimal progress by default:
    # INFO: Starting SIMBA optimization with 15 examples (set logging to DEBUG for detailed output)
    # INFO: SIMBA optimization completed
    aligned_judge = initial_judge.align(optimizer, traces_for_alignment)

    # Register the aligned judge
    aligned_judge.register(experiment_id=experiment_id)
    print("Judge aligned successfully with human feedback")
else:
    print(f"Need at least 10 traces for alignment, have {len(traces_for_alignment)}")

from mlflow.genai.judges.optimizers import SIMBAAlignmentOptimizer

# Retrieve traces with both judge and human assessments
traces_for_alignment = mlflow.search_traces(
    experiment_ids=[experiment_id], max_results=15, return_type="list"
)

# Align the judge using human corrections (minimum 10 traces recommended)
if len(traces_for_alignment) >= 10:
    # Explicitly specify SIMBA with custom model configuration
    optimizer = SIMBAAlignmentOptimizer(model="anthropic:/claude-opus-4-1-20250805")
    aligned_judge = initial_judge.align(traces_for_alignment, optimizer)

    # Register the aligned judge
    aligned_judge.register(experiment_id=experiment_id)
    print("Judge aligned successfully with human feedback")
else:
    print(f"Need at least 10 traces for alignment, have {len(traces_for_alignment)}")

SIMBA 对齐优化器

MLflow 提供基于 DSPy 对 SIMBA（Simplified Multi-Bootstrap Aggregation，简化多引导聚合）的实现的**默认对齐优化器**。当您调用 align() 而不指定优化器时，将自动使用 SIMBA 优化器。

# Default: Uses SIMBA optimizer automatically
aligned_judge = initial_judge.align(traces_with_feedback)

# Explicit: Same as above but with custom model specification
from mlflow.genai.judges.optimizers import SIMBAAlignmentOptimizer

optimizer = SIMBAAlignmentOptimizer(
    model="anthropic:/claude-opus-4-1-20250805"  # Model used for optimization
)
aligned_judge = initial_judge.align(traces_with_feedback, optimizer)

# Requirements for alignment:
# - Minimum 10 traces with BOTH judge assessments and human feedback
# - Both assessments must use the same name (matching the judge name)
# - Order doesn't matter - humans can assess before or after judge
# - Mix of agreements and disagreements between judge and human recommended

默认优化器行为

当使用不带优化器参数的 align() 时，MLflow 会自动使用 SIMBA 优化器。这简化了对齐过程，同时仍然允许在需要时进行自定义。

控制优化输出

默认情况下，对齐仅显示最少的进度信息，以保持日志整洁。如果您需要调试优化过程或查看详细的迭代进度，请启用 DEBUG 日志记录。

import logging

# Enable detailed optimization output
logging.getLogger("mlflow.genai.judges.optimizers.simba").setLevel(logging.DEBUG)

# Now alignment will show:
# - Detailed iteration-by-iteration progress
# - Score improvements at each step
# - Strategy selection details
# - Full DSPy optimization output

aligned_judge = initial_judge.align(optimizer, traces_with_feedback)

# Reset to default (minimal output) after debugging
logging.getLogger("mlflow.genai.judges.optimizers.simba").setLevel(logging.INFO)

何时使用详细日志

启用 DEBUG 日志记录，当：

优化似乎停滞不前或耗时过长
您想了解优化器如何改进指令
调试对齐失败或意外结果
了解 SIMBA 优化器内部工作原理

对于生产使用，请将其保留为 INFO（默认），以避免冗余输出。

收集用于对齐的反馈

对齐的质量取决于反馈的质量和数量。选择最适合您情况的方法。

反馈收集方法

MLflow UI（推荐）
编程方式（地面真实）

何时使用：您没有现有的地面真实标签，需要收集人类反馈。

MLflow UI 提供了一个直观的界面，用于审核跟踪和添加反馈。

导航到实验中的“Traces”选项卡。
单击单个跟踪以查看输入、输出和任何现有评判者评估。
单击“Add Feedback”按钮添加反馈。
选择与您的评判者名称匹配的评估名称（例如，“product_quality”）。
根据您的评估标准提供评分。

有效的反馈收集技巧

如果您**不是领域专家**：将跟踪分配给团队成员或领域专家进行审查。
如果您**是领域专家**：创建评分表或指导文档以确保一致性。
对于**多个审阅者**：组织反馈会议，让审阅者可以一起处理批次。
为了**一致性**：在开始之前清晰地记录您的评估标准。

UI 会自动以正确的格式为对齐记录反馈。

何时使用：您已有数据的地面真实标签。

如果您已经有标记数据，则可以以编程方式将其记录为反馈。

import mlflow
from mlflow.entities import AssessmentSource, AssessmentSourceType

# Your existing ground truth dataset
ground_truth_data = [
    {"trace_id": "trace1", "label": "excellent", "query": "What is MLflow?"},
    {"trace_id": "trace2", "label": "poor", "query": "How to use tracking?"},
    {"trace_id": "trace3", "label": "good", "query": "How to log models?"},
]

# Log ground truth as feedback for alignment
for item in ground_truth_data:
    mlflow.log_feedback(
        trace_id=item["trace_id"],
        name="product_quality",  # Must match your judge name
        value=item["label"],
        source=AssessmentSource(
            source_type=AssessmentSourceType.HUMAN, source_id="ground_truth_dataset"
        ),
    )

print(f"Logged {len(ground_truth_data)} ground truth labels for alignment")

当您拥有来自以下渠道的预标记数据时，此方法效率很高：

• 以前的手动标记工作 • 专家注解 • 生产反馈系统 • 具有已知正确答案的测试数据集

多元化的审阅者

纳入多位专家的反馈，以捕捉不同的视角并减少个人偏见。

平衡的示例

包括积极和消极的示例。目标是至少各占 30%，以帮助评判者学习边界。

足够的数量

收集至少 10 个反馈示例（SIMBA 的最低要求），但通常 50-100 个示例会产生更好的结果。

一致的标准

确保审阅者使用一致的标准。提供指导方针或评分表以标准化评估。

自定义对齐优化器

MLflow 的对齐系统设计为**插件架构**，允许您创建自定义优化器以实现不同的对齐策略。这种可扩展性使您能够利用 MLflow 的评判者基础设施来实现特定领域的优化方法。

创建自定义优化器

要创建自定义对齐优化器，请扩展 AlignmentOptimizer 抽象基类。

from mlflow.genai.judges.base import AlignmentOptimizer, Judge
from mlflow.entities.trace import Trace


class MyCustomOptimizer(AlignmentOptimizer):
    """Custom optimizer implementation for judge alignment."""

    def __init__(self, model: str = None, **kwargs):
        """Initialize your optimizer with custom parameters."""
        self.model = model
        # Add any custom initialization logic

    def align(self, judge: Judge, traces: list[Trace]) -> Judge:
        """
        Implement your alignment algorithm.

        Args:
            judge: The judge to be optimized
            traces: List of traces containing human feedback

        Returns:
            A new Judge instance with improved alignment
        """
        # Your custom alignment logic here
        # 1. Extract feedback from traces
        # 2. Analyze disagreements between judge and human
        # 3. Generate improved instructions
        # 4. Return new judge with better alignment

        # Example: Return judge with modified instructions
        from mlflow.genai.judges import make_judge

        improved_instructions = self._optimize_instructions(judge.instructions, traces)

        return make_judge(
            name=judge.name,
            instructions=improved_instructions,
            model=judge.model,
        )

    def _optimize_instructions(self, instructions: str, traces: list[Trace]) -> str:
        """Your custom optimization logic."""
        # Implement your optimization strategy
        pass

使用自定义优化器

实现后，像使用内置优化器一样使用您的自定义优化器。

# Create your custom optimizer
custom_optimizer = MyCustomOptimizer(model="your-model")

# Use it for alignment
aligned_judge = initial_judge.align(traces_with_feedback, custom_optimizer)

可用优化器

MLflow 目前提供：

SIMBAAlignmentOptimizer（默认）：使用 DSPy 的 SIMBA（简化多引导聚合）进行鲁棒对齐。
自定义优化器：扩展 AlignmentOptimizer 以实现您自己的策略。

插件架构确保可以添加新的优化策略，而无需修改核心评判者系统，从而促进了可扩展性和不同对齐方法的实验。

测试对齐效果

验证对齐是否改进了您的评判者。

def test_alignment_improvement(
    original_judge, aligned_judge, test_traces: list
) -> dict:
    """Compare judge performance before and after alignment."""

    original_correct = 0
    aligned_correct = 0

    for trace in test_traces:
        # Get human ground truth from trace assessments
        feedbacks = trace.search_assessments(type="feedback")
        human_feedback = next(
            (f for f in feedbacks if f.source.source_type == "HUMAN"), None
        )

        if not human_feedback:
            continue

        # Get judge evaluations
        original_eval = original_judge(trace=trace)
        aligned_eval = aligned_judge(trace=trace)

        # Check agreement with human
        if original_eval.value == human_feedback.value:
            original_correct += 1
        if aligned_eval.value == human_feedback.value:
            aligned_correct += 1

    total = len(test_traces)
    return {
        "original_accuracy": original_correct / total,
        "aligned_accuracy": aligned_correct / total,
        "improvement": (aligned_correct - original_correct) / total,
    }

后续步骤

创建自定义评判者

学习使用 make_judge 创建特定领域的评判者。

创建评判者 →

开发工作流

查看从创建到对齐生产评判者的完整工作流。

查看工作流 →

数据集集成

将评判者与评估数据集结合使用以进行系统测试。

了解集成 →

将通用评判者转变为领域专家​

为什么对齐很重要​

从专家反馈中学习

规模化的一致标准

持续改进

减少评估错误

评判者对齐工作原理​

对齐生命周期

快速入门：对齐您的第一个评判者​

第一步：设置并生成跟踪​

第二步：收集人类反馈​

第三步：对齐并注册​

SIMBA 对齐优化器​

控制优化输出​

收集用于对齐的反馈​

反馈收集方法​