为新模型自动重写提示 (实验性)

当迁移到新的语言模型时，您经常会发现精心设计的提示在新模型上的效果不佳。MLflow 的 mlflow.genai.optimize_prompts() API 可以帮助您 **自动重写提示**，以便在使用现有应用程序的输出来作为训练数据时，在切换模型时保持输出质量。

主要优势

模型迁移：在语言模型之间无缝切换，同时保持输出一致性
自动优化：根据现有数据自动重写提示
无需真实标签：如果基于现有输出来优化提示，则无需人工标注
可追溯：利用 MLflow 追踪来理解提示的使用模式
灵活：适用于任何使用 MLflow Prompt Registry 的函数

版本要求

optimize_prompts API 要求 **MLflow >= 3.5.0**。

Model Migration Workflow

示例：简单提示 → 优化后的提示

优化前

text
Classify the sentiment. Answer 'positive'
or 'negative' or 'neutral'.

Text: {{text}}

优化后

text
Classify the sentiment of the provided text.
Your response must be one of the following:
- 'positive'
- 'negative'
- 'neutral'

Ensure your response is lowercase and contains
only one of these three words.

Text: {{text}}

Guidelines:
- 'positive': The text expresses satisfaction,
  happiness, or approval
- 'negative': The text expresses dissatisfaction,
  anger, or disapproval
- 'neutral': The text is factual or balanced
  without strong emotion

Your response must match this exact format with
no additional explanation.

何时使用提示重写

此方法最适合以下情况：

降级模型：从 gpt-5 → gpt-4o-mini 迁移以降低成本
切换提供商：从 OpenAI 切换到 Anthropic 或反之
性能优化：迁移到更快的模型，同时保持质量
您已有输出：当前系统已产生良好结果

快速入门：模型迁移工作流

以下是从 gpt-5 迁移到 gpt-4o-mini 以保持输出一致性的完整示例

步骤 1：捕获原始模型的输出

首先，使用 MLflow 追踪收集现有模型的输出

python
import mlflow
import openai
from mlflow.genai.optimize import GepaPromptOptimizer
from mlflow.genai.datasets import create_dataset
from mlflow.genai.scorers import Equivalence

# Register your current prompt
prompt = mlflow.genai.register_prompt(
    name="sentiment",
    template="""Classify the sentiment. Answer 'positive' or 'negative' or 'neutral'.
Text: {{text}}""",
)


# Define your prediction function using the original model and base prompt
@mlflow.trace
def predict_fn_base_model(text: str) -> str:
    completion = openai.OpenAI().chat.completions.create(
        model="gpt-5",  # Original model
        messages=[{"role": "user", "content": prompt.format(text=text)}],
    )
    return completion.choices[0].message.content.lower()


# Example inputs - each record contains an "inputs" dict with the function's input parameters
inputs = [
    {
        "inputs": {
            "text": "This movie was absolutely fantastic! I loved every minute of it."
        }
    },
    {"inputs": {"text": "The service was terrible and the food arrived cold."}},
    {"inputs": {"text": "It was okay, nothing special but not bad either."}},
    {
        "inputs": {
            "text": "I'm so disappointed with this purchase. Complete waste of money."
        }
    },
    {"inputs": {"text": "Best experience ever! Highly recommend to everyone."}},
    {"inputs": {"text": "The product works as described. No complaints."}},
    {"inputs": {"text": "I can't believe how amazing this turned out to be!"}},
    {"inputs": {"text": "Worst customer support I've ever dealt with."}},
    {"inputs": {"text": "It's fine for the price. Gets the job done."}},
    {"inputs": {"text": "This exceeded all my expectations. Truly wonderful!"}},
]

# Collect outputs from original model
with mlflow.start_run() as run:
    for record in inputs:
        predict_fn_base_model(**record["inputs"])

步骤 2：从追踪创建训练数据集

将追踪的输出转换为训练数据集

python
# Create dataset
dataset = create_dataset(name="sentiment_migration_dataset")

# Retrieve traces from the run
traces = mlflow.search_traces(return_type="list", run_id=run.info.run_id)

# Merge traces into dataset
dataset.merge_records(traces)

这将自动创建一个数据集，其中包含

inputs：输入变量 (此处为 text)
outputs：原始模型 (gpt-5) 的实际输出

您可以通过导航到以下位置在 MLflow UI 中查看创建的数据集：

Experiments 选项卡 → 选择您的实验
Evaluations 选项卡 → 在左侧边栏中选择 "Datasets" 选项卡
Dataset 选项卡 → 检查输入/输出对

数据集视图显示了从追踪中收集的所有输入和输出，方便在优化前验证训练数据。

步骤 3：切换模型

将您的 LM 切换到目标模型

python
# Define function using target model
@mlflow.trace
def predict_fn(text: str) -> str:
    completion = openai.OpenAI().chat.completions.create(
        model="gpt-4o-mini",  # Target model
        messages=[{"role": "user", "content": prompt.format(text=text)}],
        temperature=0,
    )
    return completion.choices[0].message.content.lower()

您可能会注意到目标模型在格式遵循方面不如原始模型一致。

步骤 4：为目标模型优化提示

使用收集的数据集为目标模型优化提示

python
# Optimize prompts for the target model
result = mlflow.genai.optimize_prompts(
    predict_fn=predict_fn,
    train_data=dataset,
    prompt_uris=[prompt.uri],
    optimizer=GepaPromptOptimizer(reflection_model="openai:/gpt-5"),
    scorers=[Equivalence(model="openai:/gpt-5")],
)

# View the optimized prompt
optimized_prompt = result.optimized_prompts[0]
print(f"Optimized template: {optimized_prompt.template}")

优化后的提示将包含额外的说明，以帮助 gpt-4o-mini 匹配 gpt-5 的行为

text
Optimized template:
Classify the sentiment of the provided text. Your response must be one of the following:
- 'positive'
- 'negative'
- 'neutral'

Ensure your response is lowercase and contains only one of these three words.

Text: {{text}}

Guidelines:
- 'positive': The text expresses satisfaction, happiness, or approval
- 'negative': The text expresses dissatisfaction, anger, or disapproval
- 'neutral': The text is factual or balanced without strong emotion

Your response must match this exact format with no additional explanation.

步骤 5：使用优化后的提示

在应用程序中部署优化后的提示

python
# Load the optimized prompt
optimized = mlflow.genai.load_prompt(optimized_prompt.uri)


# Use in production
@mlflow.trace
def predict_fn_optimized(text: str) -> str:
    completion = openai.OpenAI().chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": optimized.format(text=text)}],
        temperature=0,
    )
    return completion.choices[0].message.content.lower()


# Test with new inputs
test_result = predict_fn_optimized("This product is amazing!")
print(test_result)  # Output: positive

最佳实践

1. 收集足够的数据

为获得最佳效果，请收集至少 20-50 个多样化示例的输出

python
# ✅ Good: Diverse examples
inputs = [
    {"inputs": {"text": "Great product!"}},
    {
        "inputs": {
            "text": "The delivery was delayed by three days and the packaging was damaged. The product itself works fine but the experience was disappointing overall."
        }
    },
    {
        "inputs": {
            "text": "It meets the basic requirements. Nothing more, nothing less."
        }
    },
    # ... more varied examples
]

# ❌ Poor: Too few, too similar
inputs = [
    {"inputs": {"text": "Good"}},
    {"inputs": {"text": "Bad"}},
]

2. 使用代表性示例

包含边缘情况和具有挑战性的输入

python
inputs = [
    {"inputs": {"text": "Absolutely fantastic!"}},  # Clear positive
    {"inputs": {"text": "It's not bad, I guess."}},  # Ambiguous
    {"inputs": {"text": "The food was good but service terrible."}},  # Mixed sentiment
]

3. 验证结果

在生产部署之前，请始终使用 mlflow.genai.evaluate() 测试优化后的提示。

python
# Evaluate optimized prompt
results = mlflow.genai.evaluate(
    data=test_dataset,
    predict_fn=predict_fn_optimized,
    scorers=[accuracy_scorer, format_scorer],
)

print(f"Accuracy: {results.metrics['accuracy']}")
print(f"Format compliance: {results.metrics['format_scorer']}")

另请参阅

优化提示：通用的提示优化指南
创建和编辑提示：Prompt Registry 基础知识
评估提示：评估提示性能
MLflow 追踪：理解 MLflow 追踪

示例：简单提示 → 优化后的提示​

何时使用提示重写​

快速入门：模型迁移工作流​

步骤 1：捕获原始模型的输出​

步骤 2：从追踪创建训练数据集​

步骤 3：切换模型​

步骤 4：为目标模型优化提示​

步骤 5：使用优化后的提示​

最佳实践​

1. 收集足够的数据​

2. 使用代表性示例​

3. 验证结果​

另请参阅​