跳到主要内容

AI 网关使用

了解如何查询您的 AI 网关端点,与应用程序集成,并利用不同的 API 和工具。

基本查询

REST API 请求

网关暴露了遵循 OpenAI 兼容模式的 REST 端点。每个端点都接受 JSON 负载并返回结构化响应。当与没有 MLflow 客户端库的应用程序集成时,请使用这些端点。

# Chat completions
curl -X POST https://:5000/gateway/chat/invocations \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Hello, how are you?"}
]
}'

# Text completions
curl -X POST https://:5000/gateway/completions/invocations \
-H "Content-Type: application/json" \
-d '{
"prompt": "The future of AI is",
"max_tokens": 100
}'

# Embeddings
curl -X POST https://:5000/gateway/embeddings/invocations \
-H "Content-Type: application/json" \
-d '{
"input": "Text to embed"
}'

查询参数

这些参数控制模型行为,并受大多数提供商支持。不同的模型可能支持这些参数的不同子集。

聊天补全

{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is machine learning?"}
],
"temperature": 0.7,
"max_tokens": 150,
"top_p": 0.9,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"stop": ["\n\n"],
"stream": false
}

文本补全

{
"prompt": "Once upon a time",
"temperature": 0.8,
"max_tokens": 100,
"top_p": 1.0,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"stop": [".", "!"],
"stream": false
}

嵌入

{
"input": ["Text to embed", "Another text"],
"encoding_format": "float"
}

流式响应

启用流式传输以实现实时响应生成。

curl -X POST https://:5000/gateway/chat/invocations \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Write a story"}],
"stream": true
}'

Python 客户端集成

MLflow 部署客户端

MLflow 部署客户端提供了一个 Python 接口,用于处理身份验证、错误处理和响应解析。在构建 Python 应用程序时请使用此客户端。

from mlflow.deployments import get_deploy_client

# Create a client for the gateway
client = get_deploy_client("https://:5000")

# Query a chat endpoint
response = client.predict(
endpoint="chat",
inputs={"messages": [{"role": "user", "content": "What is MLflow?"}]},
)

print(response["choices"][0]["message"]["content"])

高级客户端用法

为流式响应和批量嵌入生成等常见操作构建可重用函数。

from mlflow.deployments import get_deploy_client

# Initialize client
client = get_deploy_client("https://:5000")


# Chat with streaming
def stream_chat(prompt):
response = client.predict(
endpoint="chat",
inputs={
"messages": [{"role": "user", "content": prompt}],
"stream": True,
"temperature": 0.7,
},
)

for chunk in response:
if chunk["choices"][0]["delta"].get("content"):
print(chunk["choices"][0]["delta"]["content"], end="")


# Generate embeddings
def get_embeddings(texts):
response = client.predict(endpoint="embeddings", inputs={"input": texts})
return [item["embedding"] for item in response["data"]]


# Example usage
stream_chat("Explain quantum computing")
embeddings = get_embeddings(["Hello world", "MLflow AI Gateway"])

错误处理

正确的错误处理可帮助您区分网络问题、身份验证问题和特定于模型的错误。

from mlflow.deployments import get_deploy_client
from mlflow.exceptions import MlflowException

client = get_deploy_client("https://:5000")

try:
response = client.predict(
endpoint="chat", inputs={"messages": [{"role": "user", "content": "Hello"}]}
)
print(response)
except MlflowException as e:
print(f"MLflow error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")

流式响应

对于长篇内容生成,请启用流式传输以在生成部分响应时接收它们,而不是等待完整响应。

curl -X POST https://:5000/gateway/chat/invocations \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Write a story"}],
"stream": true
}'

API 参考

网关管理

以编程方式查询网关的当前配置和可用端点。

from mlflow.deployments import get_deploy_client

client = get_deploy_client("https://:5000")

# List available endpoints
endpoints = client.list_endpoints()
for endpoint in endpoints:
print(f"Endpoint: {endpoint['name']}")

# Get endpoint details
endpoint_info = client.get_endpoint("chat")
print(f"Model: {endpoint_info.get('model', {}).get('name', 'N/A')}")
print(f"Provider: {endpoint_info.get('model', {}).get('provider', 'N/A')}")

# Note: Route creation, updates, and deletion are typically done
# through configuration file changes, not programmatically

健康监控

监控生产部署中网关的可用性和响应能力。

import requests

try:
response = requests.get("https://:5000/health")
print(f"Status: {response.status_code}")
if response.status_code == 200:
print("Gateway is healthy")
except requests.RequestException as e:
print(f"Health check failed: {e}")

后续步骤