结构化输出

MLflow Prompt Registry 支持为您的 Prompt 定义结构化输出 Schema，确保语言模型响应遵循一致的格式并可被验证。此功能对于需要以编程方式解析和处理模型输出的应用程序特别有用。

概述

结构化输出允许您

使用 Pydantic 模型或 JSON Schema 定义预期的响应格式
根据您定义的 Schema 验证模型响应
确保不同模型调用之间的一致性
改善与下游应用程序的集成
在您的 GenAI 应用程序中启用类型安全

注意

重要：response_format 参数用于跟踪和文档目的，而非直接运行时强制执行。MLflow 将此信息作为元数据存储，以帮助您了解 Prompt 的预期输出结构，但它不会在模型执行期间自动验证或强制执行格式。您有责任在应用程序代码中实现实际的验证和强制执行。

基本用法

使用 Pydantic 模型

定义结构化输出最常见的方法是使用 Pydantic 模型

import mlflow
from pydantic import BaseModel
from typing import List


class SummaryResponse(BaseModel):
    summary: str
    key_points: List[str]
    word_count: int


# Register prompt with structured output
prompt = mlflow.genai.register_prompt(
    name="summarization-prompt",
    template="Summarize the following text in {{ num_sentences }} sentences: {{ text }}",
    response_format=SummaryResponse,
    commit_message="Added structured output for summarization",
    tags={"task": "summarization", "structured": "true"},
)

使用 JSON Schema

您也可以使用 JSON Schema 字典定义响应格式

import mlflow

# Define response format as JSON schema
response_schema = {
    "type": "object",
    "properties": {
        "answer": {"type": "string", "description": "The main answer"},
        "confidence": {"type": "number", "description": "Confidence score (0-1)"},
        "sources": {
            "type": "array",
            "items": {"type": "string"},
            "description": "List of source references",
        },
    },
    "required": ["answer", "confidence"],
}

# Register prompt with JSON schema
prompt = mlflow.genai.register_prompt(
    name="qa-prompt",
    template="Answer the following question: {{ question }}",
    response_format=response_schema,
    commit_message="Added structured output for Q&A",
    tags={"task": "qa", "structured": "true"},
)

高级示例

复杂响应格式

对于更复杂的应用程序，您可以定义嵌套结构

import mlflow
from pydantic import BaseModel
from typing import List, Optional
from datetime import datetime


class AnalysisResult(BaseModel):
    sentiment: str
    confidence: float
    entities: List[str]
    summary: str


class DocumentAnalysis(BaseModel):
    document_id: str
    analysis: AnalysisResult
    processed_at: datetime
    metadata: Optional[dict] = None


# Register prompt with complex structured output
prompt = mlflow.genai.register_prompt(
    name="document-analyzer",
    template="Analyze the following document: {{ document_text }}",
    response_format=DocumentAnalysis,
    commit_message="Added comprehensive document analysis output",
    tags={"task": "analysis", "complex": "true"},
)

带结构化输出的聊天 Prompt

聊天 Prompt 也可以使用结构化输出格式

import mlflow
from pydantic import BaseModel


class ChatResponse(BaseModel):
    response: str
    tone: str
    suggestions: List[str]


# Chat prompt with structured output
chat_template = [
    {"role": "system", "content": "You are a helpful {{ style }} assistant."},
    {"role": "user", "content": "{{ question }}"},
]

prompt = mlflow.genai.register_prompt(
    name="assistant-chat",
    template=chat_template,
    response_format=ChatResponse,
    commit_message="Added structured output for chat responses",
    tags={"type": "chat", "structured": "true"},
)

加载和使用结构化 Prompt

当您加载带有结构化输出的 Prompt 时，您可以访问响应格式以进行跟踪和文档目的

# Load the prompt
prompt = mlflow.genai.load_prompt("prompts:/summarization-prompt/1")

# Check if it has structured output (for tracking purposes)
if prompt.response_format:
    print(f"Response format: {prompt.response_format}")

# Format the prompt
formatted_text = prompt.format(num_sentences=3, text="Your content here...")

# Use with a language model that supports structured output
# Note: You need to implement validation against your defined schema

与语言模型集成

OpenAI 集成

import openai

client = openai.OpenAI()

# Load prompt with structured output
prompt = mlflow.genai.load_prompt("prompts:/summarization-prompt/1")

# Use with OpenAI's response_format parameter
response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {"role": "user", "content": prompt.format(num_sentences=3, text="Your text")}
    ],
    response_format=prompt.response_format,  # OpenAI's structured output
)

# Get structured output
import json

result = json.loads(response.choices[0].message.content)

LangChain 集成

from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel

# Load prompt with structured output
prompt = mlflow.genai.load_prompt("prompts:/qa-prompt/1")

# Create LangChain prompt template
langchain_prompt = PromptTemplate.from_template(prompt.template)

# Use with LangChain's structured output
llm = ChatOpenAI(model="gpt-4")
chain = langchain_prompt | llm.with_structured_output(prompt.response_format)

# Execute the chain
result = chain.invoke({"question": "What is MLflow?"})
# result will be a validated Pydantic model instance

主要收获

结构化输出用于跟踪和文档目的，以定义预期响应格式
Pydantic 模型为您的响应格式提供类型安全和验证 Schema
JSON Schema 为复杂的嵌套结构提供灵活性
与 OpenAI 和 LangChain 等流行框架的集成非常简单
您的应用程序代码中需要手动验证 - MLflow 不会在运行时强制执行格式

后续步骤

创建和编辑 Prompt 了解 Prompt 管理的基础知识
在应用程序中使用 Prompt 查看如何将 Prompt 集成到您的应用程序中
评估 Prompt 了解如何评估 Prompt 性能

结构化输出是一项强大的功能，通过确保数据格式的一致性并实现与下游系统的更好集成，可以显著提高 GenAI 应用程序的可靠性和可维护性。

概述​

基本用法​

使用 Pydantic 模型​

使用 JSON Schema​

高级示例​

复杂响应格式​

带结构化输出的聊天 Prompt​

加载和使用结构化 Prompt​

与语言模型集成​

OpenAI 集成​

LangChain 集成​

主要收获​

后续步骤​

概述