结构化输出
MLflow Prompt Registry 支持为您的 Prompt 定义结构化输出 Schema,确保语言模型响应遵循一致的格式并可被验证。此功能对于需要以编程方式解析和处理模型输出的应用程序特别有用。
概述
结构化输出允许您
- 使用 Pydantic 模型或 JSON Schema 定义预期的响应格式
- 根据您定义的 Schema 验证模型响应
- 确保不同模型调用之间的一致性
- 改善与下游应用程序的集成
- 在您的 GenAI 应用程序中启用类型安全
注意
重要:response_format
参数用于跟踪和文档目的,而非直接运行时强制执行。MLflow 将此信息作为元数据存储,以帮助您了解 Prompt 的预期输出结构,但它不会在模型执行期间自动验证或强制执行格式。您有责任在应用程序代码中实现实际的验证和强制执行。
基本用法
使用 Pydantic 模型
定义结构化输出最常见的方法是使用 Pydantic 模型
import mlflow
from pydantic import BaseModel
from typing import List
class SummaryResponse(BaseModel):
summary: str
key_points: List[str]
word_count: int
# Register prompt with structured output
prompt = mlflow.genai.register_prompt(
name="summarization-prompt",
template="Summarize the following text in {{ num_sentences }} sentences: {{ text }}",
response_format=SummaryResponse,
commit_message="Added structured output for summarization",
tags={"task": "summarization", "structured": "true"},
)
使用 JSON Schema
您也可以使用 JSON Schema 字典定义响应格式
import mlflow
# Define response format as JSON schema
response_schema = {
"type": "object",
"properties": {
"answer": {"type": "string", "description": "The main answer"},
"confidence": {"type": "number", "description": "Confidence score (0-1)"},
"sources": {
"type": "array",
"items": {"type": "string"},
"description": "List of source references",
},
},
"required": ["answer", "confidence"],
}
# Register prompt with JSON schema
prompt = mlflow.genai.register_prompt(
name="qa-prompt",
template="Answer the following question: {{ question }}",
response_format=response_schema,
commit_message="Added structured output for Q&A",
tags={"task": "qa", "structured": "true"},
)
高级示例
复杂响应格式
对于更复杂的应用程序,您可以定义嵌套结构
import mlflow
from pydantic import BaseModel
from typing import List, Optional
from datetime import datetime
class AnalysisResult(BaseModel):
sentiment: str
confidence: float
entities: List[str]
summary: str
class DocumentAnalysis(BaseModel):
document_id: str
analysis: AnalysisResult
processed_at: datetime
metadata: Optional[dict] = None
# Register prompt with complex structured output
prompt = mlflow.genai.register_prompt(
name="document-analyzer",
template="Analyze the following document: {{ document_text }}",
response_format=DocumentAnalysis,
commit_message="Added comprehensive document analysis output",
tags={"task": "analysis", "complex": "true"},
)
带结构化输出的聊天 Prompt
聊天 Prompt 也可以使用结构化输出格式
import mlflow
from pydantic import BaseModel
class ChatResponse(BaseModel):
response: str
tone: str
suggestions: List[str]
# Chat prompt with structured output
chat_template = [
{"role": "system", "content": "You are a helpful {{ style }} assistant."},
{"role": "user", "content": "{{ question }}"},
]
prompt = mlflow.genai.register_prompt(
name="assistant-chat",
template=chat_template,
response_format=ChatResponse,
commit_message="Added structured output for chat responses",
tags={"type": "chat", "structured": "true"},
)
加载和使用结构化 Prompt
当您加载带有结构化输出的 Prompt 时,您可以访问响应格式以进行跟踪和文档目的
# Load the prompt
prompt = mlflow.genai.load_prompt("prompts:/summarization-prompt/1")
# Check if it has structured output (for tracking purposes)
if prompt.response_format:
print(f"Response format: {prompt.response_format}")
# Format the prompt
formatted_text = prompt.format(num_sentences=3, text="Your content here...")
# Use with a language model that supports structured output
# Note: You need to implement validation against your defined schema
与语言模型集成
OpenAI 集成
import openai
client = openai.OpenAI()
# Load prompt with structured output
prompt = mlflow.genai.load_prompt("prompts:/summarization-prompt/1")
# Use with OpenAI's response_format parameter
response = client.chat.completions.create(
model="gpt-4.1",
messages=[
{"role": "user", "content": prompt.format(num_sentences=3, text="Your text")}
],
response_format=prompt.response_format, # OpenAI's structured output
)
# Get structured output
import json
result = json.loads(response.choices[0].message.content)
LangChain 集成
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel
# Load prompt with structured output
prompt = mlflow.genai.load_prompt("prompts:/qa-prompt/1")
# Create LangChain prompt template
langchain_prompt = PromptTemplate.from_template(prompt.template)
# Use with LangChain's structured output
llm = ChatOpenAI(model="gpt-4")
chain = langchain_prompt | llm.with_structured_output(prompt.response_format)
# Execute the chain
result = chain.invoke({"question": "What is MLflow?"})
# result will be a validated Pydantic model instance
主要收获
- 结构化输出用于跟踪和文档目的,以定义预期响应格式
- Pydantic 模型为您的响应格式提供类型安全和验证 Schema
- JSON Schema 为复杂的嵌套结构提供灵活性
- 与 OpenAI 和 LangChain 等流行框架的集成非常简单
- 您的应用程序代码中需要手动验证 - MLflow 不会在运行时强制执行格式
后续步骤
- 创建和编辑 Prompt 了解 Prompt 管理的基础知识
- 在应用程序中使用 Prompt 查看如何将 Prompt 集成到您的应用程序中
- 评估 Prompt 了解如何评估 Prompt 性能
结构化输出是一项强大的功能,通过确保数据格式的一致性并实现与下游系统的更好集成,可以显著提高 GenAI 应用程序的可靠性和可维护性。