模型签名和输入示例

模型签名和输入示例是基础组件，它们定义了模型的使用方式，确保了 MLflow 生态系统中一致可靠的交互。

什么是模型签名和输入示例？

模型签名 - 定义了模型输入、输出和参数的预期格式。可以将其视为一份合同，精确指定了模型期望的数据以及将返回的数据。

模型输入示例 - 提供了有效模型输入的具体示例。这有助于开发者理解所需的数据格式，并验证模型是否正常工作。

Model signatures comparison

为何它们很重要？

模型签名和输入示例提供了关键优势

一致性：确保所有模型交互遵循相同的数据格式
验证：在数据格式错误到达模型之前捕获它们
文档：作为模型使用的实时文档
部署安全：使 MLflow 部署工具能够自动验证请求
UI 集成：允许 MLflow UI 显示清晰的模型要求

Databricks Unity Catalog 要求

在 Databricks Unity Catalog 中注册模型时，模型签名是必需的。 Unity Catalog 对所有注册模型强制执行具体的类型定义，并将拒绝没有正确签名的模型。在您计划在 Databricks 环境中注册模型时，请务必包含签名。

# ✅ Required for Databricks registration
mlflow.sklearn.log_model(
    model,
    name="my_model",
    input_example=X_sample,  # Generates required signature
    signature=signature,  # Or provide explicit signature
)

# ❌ Will fail in Databricks Unity Catalog
mlflow.sklearn.log_model(model, name="my_model")  # No signature

快速开始：为模型添加签名

添加签名的最简单方法是在记录模型时提供输入示例

import mlflow
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
import pandas as pd

# Load data and train model
iris = load_iris(as_frame=True)
X, y = iris.data, iris.target
model = RandomForestClassifier().fit(X, y)

with mlflow.start_run():
    # The input example automatically generates a signature
    mlflow.sklearn.log_model(
        model, name="iris_model", input_example=X.iloc[[0]]  # First row as example
    )

MLflow 自动地

从您的输入示例推断签名
验证模型与示例一起工作
将签名和示例与您的模型一起存储

自动签名推断

当您在模型日志记录期间提供 input_example 时，MLflow 会自动生成模型签名。这适用于所有模型风格，并且是大多数用例的推荐方法。

理解模型签名

模型签名由三个组件组成

输入模式
输出模式
参数模式

定义模型期望的数据结构和类型

# Column-based signature (DataFrames)
input_schema = Schema(
    [
        ColSpec("double", "sepal_length"),
        ColSpec("double", "sepal_width"),
        ColSpec("string", "species", required=False),  # Optional field
    ]
)

# Tensor-based signature (NumPy arrays)
input_schema = Schema(
    [TensorSpec(np.dtype(np.float32), (-1, 28, 28, 1))]  # Batch of 28x28 images
)

主要特性：支持表格（DataFrame）和张量（NumPy）数据，使用 required=False 的可选字段，以及包括数组和对象在内的丰富数据类型支持。

指定模型返回的内容

# Single prediction column
output_schema = Schema([ColSpec("long", "prediction")])

# Multiple outputs
output_schema = Schema(
    [
        ColSpec("double", "probability"),
        ColSpec("string", "predicted_class"),
        ColSpec("long", "confidence_score"),
    ]
)

# Tensor output
output_schema = Schema(
    [TensorSpec(np.dtype(np.float32), (-1, 10))]  # 10-class probabilities
)

定义可选的推理参数（如 temperature、max_length）

# Define inference parameters
params_schema = ParamSchema(
    [
        ParamSpec("temperature", "double", 0.7),  # Default temperature
        ParamSpec("max_tokens", "long", 100),  # Default max tokens
        ParamSpec("stop_words", "string", [".", "!"], (-1,)),  # List parameter
    ]
)

# Use in model signature
signature = ModelSignature(
    inputs=input_schema, outputs=output_schema, params=params_schema
)

常见参数： temperature 控制生成随机性，max_length/max_tokens 限制输出长度，top_k 和 top_p 控制采样策略，以及 repetition_penalty 减少重复输出。

签名类型概述

MLflow 支持两种主要的签名类型

基于列的签名 - 用于表格数据（DataFrames、字典）

# Perfect for traditional ML models
{"feature_1": 1.5, "feature_2": "category_a", "feature_3": [1, 2, 3]}

基于张量的签名 - 用于数组数据（图像、音频、嵌入）

# Perfect for deep learning models
np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [1, 2, 3]]])  # Shape: (2, 2, 3)

模型签名的类型提示

版本兼容性

类型提示支持在 MLflow 2.20.0 中引入。如果您使用的是早期版本的 MLflow，请参阅使用签名部分。

您可以使用 Python 类型提示自动定义模型签名并启用数据验证。这提供了一种更符合 Python 习惯的方式来指定模型的接口，同时获得自动验证和模式推断。

类型提示快速开始

import mlflow
from typing import List, Dict, Optional
import pydantic


class Message(pydantic.BaseModel):
    role: str
    content: str
    metadata: Optional[Dict[str, str]] = None


class CustomModel(mlflow.pyfunc.PythonModel):
    def predict(self, model_input: List[Message]) -> List[str]:
        # Signature automatically inferred from type hints!
        return [msg.content for msg in model_input]


# Log model - signature is auto-generated from type hints
with mlflow.start_run():
    mlflow.pyfunc.log_model(
        name="chat_model",
        python_model=CustomModel(),
        input_example=[
            {"role": "user", "content": "Hello"}
        ],  # Validates against type hints
    )

主要优势

自动验证：在运行时根据类型提示验证输入数据
模式推断：从类型注解自动生成模型签名
类型安全：在类型不匹配到达模型之前捕获它们
IDE 支持：在开发过程中提供更好的自动补全和错误检测
文档：类型提示作为自文档化代码
一致性：PythonModel 实例和加载的 PyFunc 模型具有相同的验证

何时使用类型提示？

✅ 推荐用于：复杂数据结构（聊天消息、工具定义、嵌套对象）、需要严格输入验证的模型、采用现代 Python 开发实践的团队，以及具有结构化输入的 GenAI 和 LLM 应用程序。

⚠️ 考虑替代方案用于：简单表格数据（DataFrame 与输入示例配合良好）、未采用类型提示的旧代码库，以及输入结构高度动态的模型。

输入类型要求

签名接口

输入签名必须是 List[...]，因为 PythonModel 期望批量数据

# ✅ Correct - Always use List wrapper
def predict(self, model_input: List[str]) -> List[str]:
    ...


def predict(self, model_input: List[Message]) -> List[Dict]:
    ...


# ❌ Incorrect - Missing List wrapper
def predict(self, model_input: str) -> str:
    ...


def predict(self, model_input: Message) -> Dict:
    ...

基本类型

List[str]  # String inputs
List[int]  # Integer inputs
List[float]  # Float inputs
List[bool]  # Boolean inputs
List[bytes]  # Binary data
List[datetime.datetime]  # Timestamps

集合类型

List[List[str]]  # Nested lists
List[Dict[str, int]]  # Dictionaries
List[Dict[str, List[str]]]  # Complex nested structures

联合类型和可选类型

List[Union[int, str]]  # Multiple possible types (becomes AnyType)
List[Optional[str]]  # Optional fields (in Pydantic models only)
List[Any]  # Any type (no validation)

Pydantic 模型（推荐）

class UserData(pydantic.BaseModel):
    name: str
    age: int
    email: Optional[str] = None  # Optional with default
    preferences: List[str] = []  # List with default


List[UserData]  # Clean, validated structure

类型提示到模式映射

类型提示	生成的模式
`List[str]`	`Schema([ColSpec(type=DataType.string)])`
`List[List[str]]`	`Schema([ColSpec(type=Array(DataType.string))])`
`List[Dict[str, str]]`	`Schema([ColSpec(type=Map(DataType.string))])`
`List[Union[int, str]]`	`Schema([ColSpec(type=AnyType())])`
`List[Message]`	`Schema([ColSpec(type=Object(...))])`

Pydantic 基本用法

import pydantic
from typing import Optional, List, Dict


class Message(pydantic.BaseModel):
    role: str
    content: str
    timestamp: Optional[str] = None


class CustomModel(mlflow.pyfunc.PythonModel):
    def predict(self, model_input: List[Message]) -> List[str]:
        return [f"{msg.role}: {msg.content}" for msg in model_input]


# Both work - automatic conversion
model.predict([Message(role="user", content="Hi")])  # Pydantic object
model.predict([{"role": "user", "content": "Hi"}])  # Dict (auto-converted)

复杂嵌套模型

class FunctionParams(pydantic.BaseModel):
    properties: Dict[str, str]
    type: str = "object"
    required: Optional[List[str]] = None


class ToolDefinition(pydantic.BaseModel):
    name: str
    description: Optional[str] = None
    parameters: Optional[FunctionParams] = None


class ChatRequest(pydantic.BaseModel):
    messages: List[Message]
    tools: Optional[List[ToolDefinition]] = None
    temperature: float = 0.7


@mlflow.pyfunc.utils.pyfunc
def advanced_predict(model_input: List[ChatRequest]) -> List[Dict[str, str]]:
    results = []
    for request in model_input:
        # Type validation ensures request.messages exists and is properly typed
        response = {"response": f"Processed {len(request.messages)} messages"}
        if request.tools:
            response["tools_count"] = str(len(request.tools))
        results.append(response)
    return results

灵活的基类

class BaseMessage(pydantic.BaseModel):
    model_config = pydantic.ConfigDict(extra="allow")  # Allow extra fields

    role: str
    content: str


class SystemMessage(BaseMessage):
    system_prompt: str


class UserMessage(BaseMessage):
    user_id: str


@mlflow.pyfunc.utils.pyfunc
def flexible_predict(model_input: List[BaseMessage]) -> List[str]:
    # Input automatically converted to BaseMessage objects
    # Extra fields from subclasses preserved
    results = []
    for msg in model_input:
        result = f"{msg.role}: {msg.content}"
        if hasattr(msg, "system_prompt"):
            result += f" (system: {msg.system_prompt})"
        elif hasattr(msg, "user_id"):
            result += f" (user: {msg.user_id})"
        results.append(result)
    return results

Pydantic 最佳实践

始终为可选字段提供默认值

# ✅ Good - Optional fields have defaults
class Message(pydantic.BaseModel):
    role: str
    content: str
    metadata: Optional[Dict[str, str]] = None
    timestamp: Optional[str] = None


# ❌ Bad - Optional field without default
class Message(pydantic.BaseModel):
    role: str
    content: str
    metadata: Optional[Dict[str, str]]  # Will cause validation errors

自动数据验证

类型提示为 PythonModel 实例和加载的 PyFunc 模型启用自动验证

model = CustomModel()

# ✅ Works: Pydantic objects
input_data = [Message(role="user", content="Hello")]
result = model.predict(input_data)

# ✅ Works: Dictionaries (auto-converted to Pydantic objects)
input_data = [{"role": "user", "content": "Hello"}]
result = model.predict(input_data)

# ❌ Fails: Missing required fields
input_data = [{"role": "user"}]  # Missing 'content'
model.predict(input_data)  # Raises validation error

# ❌ Fails: Wrong data type
input_data = ["hello"]  # Expected dict/Pydantic object
model.predict(input_data)  # Raises validation error

数据转换示例

# Input: Dictionary
input_dict = {"role": "system", "content": "Hello", "metadata": {"source": "api"}}

# Automatically converted to: Message object
# Message(role="system", content="Hello", metadata={"source": "api"})

# Works for nested structures too
complex_input = {
    "messages": [{"role": "user", "content": "Hi"}],
    "tools": [{"name": "search", "description": "Web search"}],
    "temperature": 0.5,
}
# Automatically converted to: ChatRequest object with nested Message and ToolDefinition objects

验证错误示例

# Missing required field
try:
    model.predict([{"role": "system"}])  # Missing 'content'
except Exception as e:
    print(e)
    # Output: 1 validation error for Message
    # content
    #   Field required [type=missing, input_value={'role': 'system'}, input_type=dict]

# Wrong data type
try:
    model.predict(["hello"])  # Expected dict/object
except Exception as e:
    print(e)
    # Output: Failed to validate data against type hint `list[Message]`, invalid elements:
    # [('hello', "Expecting example to be a dictionary or pydantic model instance...")]

验证范围

输出验证

MLflow 根据类型提示验证输入数据，但不验证模型输出。输出类型提示仅用于模型签名推断。

TypeFromExample

适用于您希望从输入示例自动推断类型的情况

from mlflow.types.type_hints import TypeFromExample


class FlexibleModel(mlflow.pyfunc.PythonModel):
    def predict(self, model_input: TypeFromExample):
        # Type determined by input_example at logging time
        return [
            item.upper() if isinstance(item, str) else str(item) for item in model_input
        ]


# Input example determines the expected type
with mlflow.start_run():
    mlflow.pyfunc.log_model(
        name="flexible_model",
        python_model=FlexibleModel(),
        input_example=["sample", "data"],  # Expects List[str]
    )

# At inference, validates against List[str] type
loaded_model = mlflow.pyfunc.load_model(model_uri)
result = loaded_model.predict(["hello", "world"])  # ✅ Works

旧版类型提示（无验证）

这些类型提示有效，但不提供验证或模式推断

# Supported but no validation
def predict(self, model_input: pd.DataFrame) -> pd.DataFrame:
    ...


def predict(self, model_input: np.ndarray) -> np.ndarray:
    ...


def predict(self, model_input: scipy.sparse.csr_matrix):
    ...


# You must provide explicit signature or input_example
with mlflow.start_run():
    mlflow.pyfunc.log_model(
        name="legacy_model",
        python_model=model,
        input_example=sample_dataframe,  # Required for legacy types
    )

使用 @pyfunc 装饰器

适用于可调用函数（非类）

from mlflow.pyfunc.utils import pyfunc


@pyfunc
def predict(model_input: List[Message]) -> List[str]:
    return [msg.content for msg in model_input]


# Same validation works as with PythonModel
predict([{"role": "user", "content": "Hi"}])  # ✅ Auto-converts dict to Message
predict(["hello"])  # ❌ Validation error

联合类型行为

# Union types become AnyType (no validation)
def predict(self, model_input: List[Union[str, int]]) -> List[str]:
    # MLflow infers this as List[AnyType] - no validation performed
    return [str(item) for item in model_input]


# Better approach: Use Pydantic discriminated unions for validation
from typing import Literal


class TextInput(pydantic.BaseModel):
    type: Literal["text"] = "text"
    content: str


class NumberInput(pydantic.BaseModel):
    type: Literal["number"] = "number"
    value: int


# Discriminated union with validation
def predict(self, model_input: List[Union[TextInput, NumberInput]]) -> List[str]:
    ...

使用类型提示提供模型服务

当使用类型提示提供模型服务时，请始终在 JSON 请求中使用 inputs 键

# Start local server
mlflow models serve -m runs/<run_id>/model --env-manager local

# Correct request format
curl -X POST http://127.0.0.1:5000/invocations \
  -H 'Content-Type: application/json' \
  -d '{"inputs": [{"role": "user", "content": "Hello"}]}'

# ❌ Incorrect - missing inputs wrapper
curl -X POST http://127.0.0.1:5000/invocations \
  -H 'Content-Type: application/json' \
  -d '[{"role": "user", "content": "Hello"}]'

部署最佳实践

输入示例验证

# Always provide input examples that match your type hints
with mlflow.start_run():
    model_info = mlflow.pyfunc.log_model(
        name="chat_model",
        python_model=CustomModel(),
        input_example=[{"role": "user", "content": "test"}],  # Matches List[Message]
    )

# MLflow validates the input_example against type hints at logging time

部署前测试

# Test locally first
model = CustomModel()
test_input = [{"role": "user", "content": "test"}]

# Verify validation works
try:
    result = model.predict(test_input)
    print("✅ Validation passed")
except Exception as e:
    print(f"❌ Validation failed: {e}")

# Test loaded model
loaded_model = mlflow.pyfunc.load_model(model_info.model_uri)
result = loaded_model.predict(test_input)

生产考量

错误处理

class RobustModel(mlflow.pyfunc.PythonModel):
    def predict(self, model_input: List[Message]) -> List[str]:
        try:
            return [msg.content for msg in model_input]
        except Exception as e:
            # Log validation errors for monitoring
            logger.error(f"Prediction failed: {e}")
            raise

性能：类型验证只增加极小的开销，Pydantic 验证经过高度优化，您应该考虑对类似结构的重复验证进行缓存。

类型提示最佳实践

开发工作流

# ✅ Recommended pattern
class MyModel(mlflow.pyfunc.PythonModel):
    def predict(self, model_input: List[MyPydanticModel]) -> List[str]:
        # Clear type annotations
        # Automatic validation
        # Good IDE support
        return [process(item) for item in model_input]

关键指南

对复杂数据结构使用 Pydantic 模型
在 Pydantic 模型中为可选字段设置默认值
使用类型提示时不要传递显式的 signature 参数
始终提供与您的类型提示匹配的输入示例
当您希望在没有显式类型的情况下获得灵活性时，请使用 TypeFromExample
在部署前在本地测试验证

重要说明

使用类型提示时切勿传递显式 signature 参数 - MLflow 将使用推断的签名，如果不匹配则会发出警告
联合类型变为 AnyType - 使用 Pydantic 判别联合进行正确验证
TypeFromExample 和旧版类型提示需要输入示例

数据类型和示例

基于列的数据类型
基于张量的数据类型
推理参数

基本类型

Python 到 MLflow 类型映射

类型限制

这些类型的使用仅支持标量定义或一维数组。不允许混合类型。

Python 类型	MLflow 类型	示例	备注
`str`	`字符串`	`"hello world"`
`int`	`long`	`42`	64 位整数
`np.int32`	`integer`	`np.int32(42)`	32 位整数
`浮点数`	`double`	`3.14159`	64 位浮点数
`np.float32`	`浮点数`	`np.float32(3.14)`	32 位浮点数
`bool`	`boolean`	`True`
`np.bool_`	`boolean`	`np.bool_(True)`	NumPy 布尔值
`datetime`	`datetime`	`pd.Timestamp("2023-01-01")`
`bytes`	`binary`	`b"binary data"`
`bytearray`	`binary`	`bytearray(b"data")`
`np.bytes_`	`binary`	`np.bytes_(b"data")`	NumPy 字节

复合类型

数组（列表/NumPy 数组）

{
    "simple_list": ["a", "b", "c"],
    "nested_array": [[1, 2], [3, 4], [5, 6]],
    "numpy_array": np.array([1.1, 2.2, 3.3]),
}

对象（字典）

{"user_profile": {"name": "Alice", "age": 30, "preferences": ["sports", "music"]}}

可选字段

# Include None values to make fields optional
pd.DataFrame(
    {
        "required_field": [1, 2, 3],
        "optional_field": [1.0, None, 3.0],  # This becomes optional
    }
)

兼容性说明

版本兼容性

版本要求

数组和对象类型：要求 MLflow ≥ 2.10.0
Spark ML 向量：要求 MLflow ≥ 2.15.0
AnyType：要求 MLflow ≥ 2.19.0

NumPy 数据类型

张量签名支持所有 NumPy 数据类型

np.float32  # 32-bit float
np.float64  # 64-bit float (double)
np.int8  # 8-bit integer
np.int32  # 32-bit integer
np.uint8  # Unsigned 8-bit (common for images)
np.bool_  # Boolean

形状规范

对于可变维度（通常是批处理大小），使用 -1

# Image batch: variable batch size, 28x28 pixels, 1 channel
TensorSpec(np.dtype(np.uint8), (-1, 28, 28, 1))

# Text embeddings: variable batch size, 768-dimensional vectors
TensorSpec(np.dtype(np.float32), (-1, 768))

# Fixed shape: exactly 10 classes
TensorSpec(np.dtype(np.float32), (10,))

常见模式

计算机视觉

# Grayscale images
TensorSpec(np.dtype(np.uint8), (-1, 28, 28, 1))

# RGB images
TensorSpec(np.dtype(np.uint8), (-1, 224, 224, 3))

# Feature maps
TensorSpec(np.dtype(np.float32), (-1, 512, 7, 7))

自然语言处理

# Token IDs
TensorSpec(np.dtype(np.int64), (-1, 512))

# Embeddings
TensorSpec(np.dtype(np.float32), (-1, 768))

# Attention masks
TensorSpec(np.dtype(np.bool_), (-1, 512))

参数规范

参数允许在运行时自定义模型行为

ParamSpec(
    name="temperature",  # Parameter name
    dtype="double",  # Data type
    default=0.7,  # Default value
    shape=None,  # Shape (None for scalars, (-1,) for lists)
)

支持的参数类型

参数必须是标量或仅为一维数组。推理参数不支持多维数组。

MLflow 类型	Python 类型	标量示例	一维数组示例
`字符串`	`str`	`"gpt-4"`	`["stop1", "stop2"]`
`long`	`int` (64 位)	`100`	`[100, 200, 300]`
`integer`	`int` (32 位)	`50`	`[10, 20, 30]`
`double`	`float` (64 位)	`0.7`	`[0.1, 0.5, 0.9]`
`浮点数`	`float` (32 位)	`0.5`	`[0.1, 0.2, 0.3]`
`boolean`	`bool`	`True`	`[True, False, True]`
`datetime`	`datetime`	`datetime.now()`	`[datetime1, datetime2]`
`binary`	`bytes`	`b"data"`	`[b"data1", b"data2"]`

常见参数模式

文本生成

params_schema = ParamSchema(
    [
        ParamSpec("temperature", "double", 0.7),
        ParamSpec("max_tokens", "long", 100),
        ParamSpec("top_p", "double", 0.9),
        ParamSpec("frequency_penalty", "double", 0.0),
        ParamSpec("stop_sequences", "string", [], (-1,)),  # List of strings
    ]
)

模型选择

params_schema = ParamSchema(
    [
        ParamSpec("model_name", "string", "default"),
        ParamSpec("use_cache", "boolean", True),
        ParamSpec("timeout", "long", 30),
    ]
)

推理时使用参数

# Model with parameters
loaded_model = mlflow.pyfunc.load_model(model_uri)

# Use default parameters
result = loaded_model.predict(input_data)

# Override specific parameters
result = loaded_model.predict(input_data, params={"temperature": 0.1, "max_tokens": 50})

签名强制执行和验证

Signature enforcement process

MLflow 在以下情况下自动根据模型签名验证输入：

将模型作为 PyFunc 加载 (mlflow.pyfunc.load_model)
使用 MLflow 部署工具
通过 MLflow 的 REST API 提供模型服务

验证规则

输入验证

必填字段：必须存在，否则验证失败
可选字段：可以缺失而不报错
额外字段：被忽略（不传递给模型）
类型转换：在可能的情况下应用安全转换

参数验证

类型检查：参数必须与指定类型匹配
形状验证：列表参数验证形状是否正确
默认值：在未提供参数时应用
未知参数：生成警告但不失败

处理常见问题

带有缺失值的整数列

# ❌ Problem: Integer column with NaN becomes float, causing type mismatch
df = pd.DataFrame({"int_col": [1, 2, None]})  # Becomes float64

# ✅ Solution: Define as double from the start
df = pd.DataFrame({"int_col": [1.0, 2.0, None]})  # Stays float64

类型转换示例

# ✅ Safe conversions (allowed)
int → long     # 32-bit to 64-bit integer
int → double   # Integer to float
float → double # 32-bit to 64-bit float

# ❌ Unsafe conversions (rejected)
long → double  # Potential precision loss
string → int   # No automatic parsing

使用签名

记录带签名的模型
更新现有模型
高级签名模式

自动签名推断

最简单的方法 - 提供一个输入示例

import mlflow
from sklearn.ensemble import RandomForestClassifier

# Train your model
model = RandomForestClassifier().fit(X_train, y_train)

with mlflow.start_run():
    mlflow.sklearn.log_model(
        model,
        name="my_model",
        input_example=X_train.iloc[[0]],  # Signature inferred automatically
    )

手动创建签名

为了获得更多控制，请显式创建签名

from mlflow.models import ModelSignature
from mlflow.types.schema import Schema, ColSpec

# Define input schema
input_schema = Schema(
    [
        ColSpec("double", "feature_1"),
        ColSpec("string", "feature_2"),
        ColSpec("long", "feature_3", required=False),  # Optional
    ]
)

# Define output schema
output_schema = Schema([ColSpec("double", "prediction")])

# Create signature
signature = ModelSignature(inputs=input_schema, outputs=output_schema)

# Log with explicit signature
with mlflow.start_run():
    mlflow.sklearn.log_model(model, name="my_model", signature=signature)

签名推断助手

对自定义工作流使用 infer_signature

from mlflow.models import infer_signature

# Generate predictions for signature inference
predictions = model.predict(X_test)

# Infer signature from data
signature = infer_signature(X_test, predictions)

# Log with inferred signature
with mlflow.start_run():
    mlflow.sklearn.log_model(model, name="my_model", signature=signature)

向已记录模型添加签名

使用 set_signature 添加或更新现有模型上的签名

from mlflow.models import set_signature, infer_signature

# Load existing model (without signature)
model_uri = "models:/<model_id>"
model = mlflow.pyfunc.load_model(model_uri)

# Create signature from test data
signature = infer_signature(X_test, model.predict(X_test))

# Apply signature to existing model
set_signature(model_uri, signature)

# Verify signature was set
from mlflow.models.model import get_model_info

assert get_model_info(model_uri).signature == signature

使用模型注册表

对于已注册的模型，更新源并创建新版本

from mlflow.client import MlflowClient

client = MlflowClient()
model_name = "my_registered_model"
model_version = 1

# Get existing model version
mv = client.get_model_version(name=model_name, version=model_version)

# Update signature on source artifacts
signature = infer_signature(X_test, predictions)
set_signature(mv.source, signature)

# Create new model version with updated signature
client.create_model_version(name=model_name, source=mv.source, run_id=mv.run_id)

GenAI 模型签名

对于 LangChain、OpenAI 和类似模型，当您提供输入示例时，签名会自动推断

# Input example for chat model
input_example = {"messages": [{"role": "user", "content": "What is machine learning?"}]}

# Optional fields example
input_example = [
    {"name": "Alice", "message": "Hello"},  # name is present
    {"message": "Hi there"},  # name is missing (becomes optional)
]

# Log model - signature auto-generated from input_example
with mlflow.start_run():
    mlflow.langchain.log_model(
        chain,
        name="chat_model",
        input_example=input_example,  # Signature automatically inferred!
    )

带参数的模型

在您的签名中包含推理参数 - 当同时提供输入和参数时，签名会自动推断

# Input data and parameters
input_data = "Translate to French: Hello world"
params = {"temperature": 0.3, "max_tokens": 50, "stop_sequences": [".", "!"]}

# Create signature with parameters - automatically inferred
signature = infer_signature(
    input_data, model.predict(input_data), params  # Include parameters in signature
)

with mlflow.start_run():
    mlflow.transformers.log_model(model, name="translation_model", signature=signature)

复杂数据结构

处理嵌套对象和数组 - 签名从复杂输入示例自动推断

# Complex input structure
input_example = {
    "user_data": {
        "id": 12345,
        "preferences": ["action", "comedy"],
        "metadata": {"created_date": "2023-01-01", "is_premium": True},
    },
    "context": {"device": "mobile", "location": None},  # Optional field
}

# Signature automatically handles nested structure when provided as input_example
with mlflow.start_run():
    mlflow.pyfunc.log_model(
        python_model=custom_model,
        name="complex_model",
        input_example=input_example,  # Auto-infers complex nested schema
    )

输入示例详情

输入示例除了签名推断之外，还有多项重要用途

输入示例的优势

签名推断：自动生成模型签名
模型验证：在日志记录期间验证模型是否正常工作
依赖检测：帮助识别所需包
文档：向开发者展示正确的输入格式
部署测试：验证 REST 端点有效负载格式

输入示例格式

DataFrame 示例
张量示例
JSON 示例
带参数的示例

import pandas as pd

# Single record example
single_record = pd.DataFrame(
    [{"sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2}]
)

# Multiple records example
batch_example = pd.DataFrame(
    [
        {"feature_1": 1.0, "feature_2": "A"},
        {"feature_1": 2.0, "feature_2": "B"},
        {"feature_1": 3.0, "feature_2": "C"},
    ]
)

# Log model with DataFrame example
mlflow.sklearn.log_model(model, name="model", input_example=single_record)

import numpy as np

# Image batch example (MNIST-style)
image_batch = np.random.randint(0, 255, size=(3, 28, 28, 1), dtype=np.uint8)

# Multi-input dictionary
multi_input = {
    "image": np.random.random((2, 224, 224, 3)),
    "metadata": np.array([[1.0, 2.0], [3.0, 4.0]]),
}

# Sparse matrix example
from scipy.sparse import csr_matrix

sparse_example = csr_matrix([[1, 0, 2], [0, 0, 3]])

# Log model with tensor example
mlflow.tensorflow.log_model(model, name="model", input_example=image_batch)

# Dictionary example
dict_example = {
    "messages": [
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Hello!"},
    ],
    "temperature": 0.7,
}

# List example
list_example = [
    {"text": "First document", "category": "news"},
    {"text": "Second document", "category": "sports"},
]

# Simple scalar
scalar_example = "What is the capital of France?"

# Log model with JSON example
mlflow.langchain.log_model(model, name="model", input_example=dict_example)

# Combine input data with parameters using tuple
input_data = "Translate to Spanish: Good morning"
params = {"temperature": 0.2, "max_length": 50, "do_sample": True}

# Create tuple for logging
input_example = (input_data, params)

# Log model with parameters
mlflow.transformers.log_model(
    model, name="translation_model", input_example=input_example
)

# At inference time
loaded_model = mlflow.pyfunc.load_model(model_uri)

# Use default parameters
result1 = loaded_model.predict(input_data)

# Override parameters
result2 = loaded_model.predict(input_data, params={"temperature": 0.1})

模型服务与部署

服务输入示例

MLflow 自动生成服务兼容的示例

# When you log a model with input_example
input_example = {"question": "What is MLflow?"}

with mlflow.start_run():
    model_info = mlflow.pyfunc.log_model(
        python_model=MyModel(), name="model", input_example=input_example
    )

# MLflow creates two files:
# 1. input_example.json - Original format
# 2. serving_input_example.json - REST API format

生成的文件

文件	内容	目的
`input_example.json`	`{"question": "What is MLflow?"}`	原始输入格式
`serving_input_example.json`	`{"inputs": {"question": "What is MLflow?"}}`	REST 端点格式

验证服务示例

在部署前测试您的模型

from mlflow.models.utils import load_serving_example
from mlflow.models import validate_serving_input

# Load serving example
serving_example = load_serving_example(model_info.model_uri)

# Validate it works
result = validate_serving_input(model_info.model_uri, serving_example)
print(f"Validation result: {result}")

# Test with local server
# mlflow models serve --model-uri <model_uri>
# curl -X POST -H "Content-Type: application/json" \
#      -d '<serving_example>' https://:5000/invocations

签名演练场和示例

通过我们的交互式示例探索签名行为

下载签名示例笔记本

或直接查看示例：签名示例笔记本

快速参考示例

基本示例
DataFrame 示例
张量示例

from mlflow.models import infer_signature

# Simple dictionary
simple_dict = {"name": "Alice", "age": 30, "active": True}
print(infer_signature(simple_dict))
# → Schema: [name: string, age: long, active: boolean]

# With optional fields
optional_fields = [
    {"name": "Alice", "email": "alice@example.com"},
    {"name": "Bob", "email": None},  # email becomes optional
]
print(infer_signature(optional_fields))
# → Schema: [name: string, email: string (optional)]

# Arrays and nested objects
complex_data = {
    "user": {"id": 123, "tags": ["premium", "beta"]},
    "scores": [0.8, 0.9, 0.7],
}
print(infer_signature(complex_data))
# → Nested schema with arrays and objects

import pandas as pd

# Basic DataFrame
df = pd.DataFrame(
    {
        "feature_1": [1.0, 2.0, 3.0],
        "feature_2": ["A", "B", "C"],
        "feature_3": [True, False, True],
    }
)
print(infer_signature(df))
# → Column-based schema

# With missing values (creates optional columns)
df_optional = pd.DataFrame(
    {"required_col": [1, 2, 3], "optional_col": [1.0, None, 3.0]}  # Contains None
)
print(infer_signature(df_optional))
# → optional_col marked as optional

# Mixed data types
df_mixed = pd.DataFrame(
    {
        "numbers": [1, 2, 3],
        "arrays": [[1, 2], [3, 4], [5, 6]],  # Lists in DataFrame
        "objects": [{"a": 1}, {"b": 2}, {"c": 3}],  # Dicts in DataFrame
    }
)
print(infer_signature(df_mixed))
# → Complex schema with Array and Object types

import numpy as np

# Simple tensor
tensor_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(infer_signature(tensor_2d))
# → Tensor(int64, (-1, 3))

# Image-like tensor
image_batch = np.random.randint(0, 255, (10, 28, 28, 1), dtype=np.uint8)
print(infer_signature(image_batch))
# → Tensor(uint8, (-1, 28, 28, 1))

# Multiple tensors
multi_tensor = {
    "image": np.random.random((5, 224, 224, 3)),
    "mask": np.random.randint(0, 2, (5, 224, 224, 1)),
}
print(infer_signature(multi_tensor))
# → Schema with multiple tensor specs

最佳实践和技巧

开发工作流

始终包含输入示例

# ✅ Good: Always provide examples
mlflow.sklearn.log_model(model, name="model", input_example=X_sample)

# ❌ Avoid: Logging without examples
mlflow.sklearn.log_model(model, name="model")  # No signature or validation

测试您的签名

# Validate signature works as expected
signature = infer_signature(X_test, y_pred)
loaded_model = mlflow.pyfunc.load_model(model_uri)

# Test with your signature
try:
    result = loaded_model.predict(X_test)
    print("✅ Signature validation passed")
except Exception as e:
    print(f"❌ Signature issue: {e}")

性能考量

对于大型 DataFrames

# Use a representative sample for input_example
large_df = pd.DataFrame(...)  # 1M+ rows
sample_df = large_df.sample(n=100, random_state=42)  # Representative sample

mlflow.sklearn.log_model(model, name="model", input_example=sample_df)

对于复杂对象

# Provide minimal but representative examples
minimal_example = {
    "required_field": "example_value",
    "optional_field": None,  # Shows field is optional
    "array_field": ["sample"],  # Shows it's an array
}

常见陷阱

整数处理

# ❌ Problem: Integers with NaN become floats
df = pd.DataFrame({"int_col": [1, 2, None]})  # Type becomes float64

# ✅ Solution: Use consistent types
df = pd.DataFrame({"int_col": [1.0, 2.0, None]})  # Explicit float64

嵌套结构一致性

# ❌ Problem: Inconsistent nesting
inconsistent = [
    {"level1": {"level2": "value"}},
    {"level1": "direct_value"},  # Different structure
]

# ✅ Solution: Consistent structure
consistent = [
    {"level1": {"level2": "value1"}},
    {"level1": {"level2": "value2"}},  # Same structure
]

PythonModel 的类型提示 (MLflow 2.20.0+)

from typing import Dict, List


class TypedModel(mlflow.pyfunc.PythonModel):
    def predict(self, context, model_input: List[Dict[str, str]]) -> List[str]:
        # Signature automatically inferred from type hints!
        return [item["text"].upper() for item in model_input]

故障排除

常见错误消息

"缺少必需的输入字段"

当您的模型期望一个必需字段而输入数据中不存在该字段时，就会发生此错误。

# Example: Model expects field "age" but input only has "name"
input_data = {"name": "Alice"}  # Missing required "age" field

解决方案：在您的输入数据中包含所有必需字段，或者通过在输入示例中包含 None 值将该字段标记为可选。

"无法将类型 X 转换为类型 Y"

当您尝试传递一种类型的数据而签名期望另一种类型时，就会发生这种情况。

# Example: Trying to pass string where integer expected
input_data = {"score": "85"}  # String value
# But signature expects: {"score": 85}  # Integer value

解决方案：修复您的输入数据类型以匹配签名，或者如果类型更改是故意的，则更新签名。

"张量形状不匹配"

当张量输入与签名中定义的预期形状不匹配时，就会发生此错误。

# Example: Model expects shape (None, 784) but got (None, 28, 28)
input_tensor = np.random.random((10, 28, 28))  # Wrong shape
# But signature expects: (10, 784)  # Flattened shape

解决方案：重新塑造您的输入数据以匹配预期的维度，或者如果形状要求已更改，则更新签名。

调试签名

使用这些技术来诊断与签名相关的问题

# Inspect existing model signature
from mlflow.models.model import get_model_info

model_info = get_model_info(model_uri)
print("Current signature:")
print(model_info.signature)

# Compare with inferred signature
inferred = infer_signature(your_input_data)
print("Inferred signature:")
print(inferred)

# Check compatibility
if model_info.signature != inferred:
    print("⚠️  Signatures don't match - consider updating")

其他资源

签名示例笔记本 - 交互式示例
模型 API 文档 - 完整的 API 参考
部署指南 - 在生产环境中使用签名
MLflow 模型格式 - 技术规范

什么是模型签名和输入示例？​

为何它们很重要？​

快速开始：为模型添加签名​

理解模型签名​

签名类型概述​

模型签名的类型提示​

类型提示快速开始​

主要优势​

何时使用类型提示？​

输入类型要求​

基本类型​

集合类型​

联合类型和可选类型​

Pydantic 模型（推荐）​

类型提示到模式映射​

Pydantic 基本用法​

复杂嵌套模型​

灵活的基类​

Pydantic 最佳实践​

自动数据验证​

数据转换示例​

验证错误示例​

验证范围​

TypeFromExample​

旧版类型提示（无验证）​

使用 @pyfunc 装饰器​

联合类型行为​

使用类型提示提供模型服务​

部署最佳实践​

生产考量​

类型提示最佳实践​

数据类型和示例​

基本类型​

复合类型​

兼容性说明​

NumPy 数据类型​

形状规范​

常见模式​

参数规范​

支持的参数类型​

常见参数模式​

推理时使用参数​

签名强制执行和验证​

验证规则​

处理常见问题​

使用签名​

自动签名推断​

手动创建签名​

签名推断助手​

向已记录模型添加签名​

使用模型注册表​

GenAI 模型签名​

带参数的模型​

复杂数据结构​

输入示例详情​

输入示例的优势​

输入示例格式​

模型服务与部署​

服务输入示例​

验证服务示例​

签名演练场和示例​

快速参考示例​

最佳实践和技巧​

开发工作流​

性能考量​

常见陷阱​

故障排除​

常见错误消息​

调试签名​

其他资源​

什么是模型签名和输入示例？

为何它们很重要？

快速开始：为模型添加签名

理解模型签名

签名类型概述

模型签名的类型提示

类型提示快速开始

主要优势

何时使用类型提示？

输入类型要求

基本类型

集合类型

联合类型和可选类型

Pydantic 模型（推荐）

类型提示到模式映射

Pydantic 基本用法

复杂嵌套模型

灵活的基类

Pydantic 最佳实践

自动数据验证

数据转换示例

验证错误示例

验证范围

TypeFromExample

旧版类型提示（无验证）

使用 @pyfunc 装饰器

联合类型行为

使用类型提示提供模型服务

部署最佳实践

生产考量

类型提示最佳实践

数据类型和示例

基本类型

复合类型

兼容性说明

NumPy 数据类型

形状规范

常见模式

参数规范

支持的参数类型

常见参数模式

推理时使用参数

签名强制执行和验证

验证规则

处理常见问题

使用签名

自动签名推断

手动创建签名

签名推断助手

向已记录模型添加签名

使用模型注册表

GenAI 模型签名

带参数的模型

复杂数据结构

输入示例详情

输入示例的优势

输入示例格式

模型服务与部署

服务输入示例

验证服务示例

签名演练场和示例

快速参考示例

最佳实践和技巧

开发工作流

性能考量

常见陷阱

故障排除

常见错误消息

调试签名

其他资源