MLflow PythonModel 指南

MLflow PythonModel 简介

mlflow.pyfunc 模块提供了 save_model() 和 log_model() 工具，用于创建包含用户指定代码和artifact（文件）依赖项、具有 python_function 风格的 MLflow 模型。

MLflow PythonModel 使您能够实现自定义模型逻辑，同时利用 MLflow 的打包和部署能力。

定义 PythonModel 有两种方法：继承 mlflow.pyfunc.PythonModel() 或定义一个可调用对象。本指南提供了关于如何定义和使用自定义 PythonModel 的完整 walkthrough。

定义自定义 PythonModel

选项 1：继承 PythonModel

mlflow.pyfunc 模块提供了一个通用的 PythonModel 类，可用于定义您自己的自定义模型。通过继承它，模型可以与 MLflow 的其他组件无缝集成。

PythonModel 的方法

predict 一个有效的 PythonModel 必须实现 predict 方法，该方法定义了模型的预测逻辑。当模型使用 mlflow.pyfunc.load_model 加载为 PyFunc 模型并调用 predict 函数时，MLflow 会调用此方法。
predict_stream 如果模型打算用于流处理环境，则应实现 predict_stream 方法。当模型使用 mlflow.pyfunc.load_model 加载为 PyFunc 模型并调用 predict_stream 时，MLflow 会调用此方法。
load_context 如果模型需要加载额外的上下文，则实现 load_context 方法。更多详情请参考 load_context()。

提示

从 MLflow 2.20.0 开始，如果 context 参数未使用，可以将其从 predict 和 predict_stream 函数中移除。例如，def predict(self, model_input, params) 是一个有效的 predict 函数签名。

下面是一个简单的 PythonModel 示例，它接收一个字符串列表并返回它。

import mlflow


class MyModel(mlflow.pyfunc.PythonModel):
    def predict(self, model_input: list[str], params=None) -> list[str]:
        return model_input

选项 2：定义一个可调用对象

另一种记录 PythonModel 的方法是定义一个接受单个参数并返回预测结果的可调用对象。通过将其传递给 mlflow.pyfunc.log_model，可以将此可调用对象记录为 PythonModel。

提示

从 MLflow 2.20.0 开始，您可以在可调用对象上使用 @pyfunc 装饰器，以根据类型提示启用输入数据验证。有关更多详细信息，请参阅PythonModel 中的类型提示用法。

from mlflow.pyfunc.utils import pyfunc


@pyfunc
def predict(model_input: list[str]) -> list[str]:
    return model_input

记录模型

使用 pyfunc 模块通过 mlflow.pyfunc.log_model() 记录自定义模型。

import mlflow

with mlflow.start_run():
    model_info = mlflow.pyfunc.log_model(
        artifact_path="model",
        python_model=MyModel(),
        input_example=input_example,
    )

部署前验证模型

在部署模型之前，使用 mlflow.models.predict() API 验证模型的依赖项和输入数据。有关更多详细信息，请参阅MLflow 模型验证。

import mlflow

mlflow.models.predict(
    model_uri=model_info.model_uri,
    input_data=["a", "b", "c"],
    env_manager="uv",
)

此外，您可以在本地加载模型并通过运行预测来验证它。

import mlflow

pyfunc_model = mlflow.pyfunc.load_model(model_info.model_uri)
pyfunc_model.predict(["hello", "world"])

部署模型

在生产环境中使用模型的最后一步是部署它。按照MLflow 模型部署指南部署模型。

PythonModel 中的类型提示用法

从 MLflow 2.20.0 开始，类型提示现在是定义模型接口的有效方式。您可以使用类型提示来定义模型的输入和输出类型。利用类型提示具有以下优势：

数据验证：MLflow 根据模型中定义的类型提示验证输入数据。无论是 PythonModel 实例还是加载的 PyFunc 模型，输入数据都会得到一致的验证。
类型提示推断：MLflow 根据模型中定义的类型提示推断模型的输入和输出 schema，并将推断出的结构设置为记录的模型签名。

支持的类型提示

PythonModel 输入签名中使用的类型提示必须是 list[...] 类型，因为 PythonModel 的 predict 函数期望批量输入数据。以下类型提示作为 list[...] 的元素类型受到支持：

基本类型：int, float, str, bool, bytes, datetime.datetime
集合类型：list, dict
联合类型：Union[type1, type2, ...] 或 type1 | type2 | ...
可选类型：Optional[type]
Pydantic 模型：pydantic.BaseModel 的子类（字段必须是本节中提及的支持类型）
typing.Any：Any

类型提示用法的限制

Pydantic 模型：可选字段必须包含默认值。
联合类型：多个有效类型的联合在 MLflow 中被推断为 AnyType，MLflow 不基于此进行数据验证。
可选类型：可选类型不能直接用于 list[...]，因为 predict 函数的输入不应为 None。

以下是一些支持的类型提示示例：

list[str], list[int], list[float], list[bool], list[bytes], list[datetime.datetime]
list[list[str]]...
list[dict[str, str]], list[dict[str, int]], list[dict[str, list[str]]]...
list[Union[int, str]], list[str | dict[str, int]]...

以下是嵌套 pydantic 模型作为类型提示的示例：

from mlflow.pyfunc.utils import pyfunc
import pydantic
from typing import Optional


class Message(pydantic.BaseModel):
    role: str
    content: str


class FunctionParams(pydantic.BaseModel):
    properties: dict[str, str]
    type: str = "object"
    required: Optional[list[str]] = None
    additionalProperties: Optional[bool] = None


class ToolDefinition(pydantic.BaseModel):
    name: str
    description: Optional[str] = None
    parameters: Optional[FunctionParams] = None
    strict: Optional[bool] = None


class ChatRequest(pydantic.BaseModel):
    messages: list[Message]
    tool: Optional[ToolDefinition] = None


@pyfunc
def predict(model_input: list[ChatRequest]) -> list[list[str]]:
    return [[msg.content for msg in request.messages] for request in model_input]


input_example = [ChatRequest(messages=[Message(role="user", content="Hello")])]
print(predict(input_example))  # Output: [['Hello']]

在 PythonModel 中使用类型提示

要在 PythonModel 中使用类型提示，您可以在 predict 函数签名中定义输入和输出类型。下面是一个 PythonModel 示例，它接受一个 Message 对象列表并返回一个字符串列表。

import pydantic
import mlflow


class Message(pydantic.BaseModel):
    role: str
    content: str


class CustomModel(mlflow.pyfunc.PythonModel):
    def predict(self, model_input: list[Message], params=None) -> list[str]:
        return [msg.content for msg in model_input]

PythonModel 中的类型提示数据验证

通过继承 mlflow.pyfunc.PythonModel()，您可以免费获得基于类型提示的数据验证。数据验证适用于 PythonModel 实例和加载的 PyFunc 模型。

以下示例演示了基于上面定义的 CustomModel 如何进行数据验证。

model = CustomModel()

# The input_example can be a list of Message objects as defined in the type hint
input_example = [
    Message(role="system", content="Hello"),
    Message(role="user", content="Hi"),
]
print(model.predict(input_example))  # Output: ['Hello', 'Hi']

# The input_example can also be a list of dict with the same schema as Message
input_example = [
    {"role": "system", "content": "Hello"},
    {"role": "user", "content": "Hi"},
]
print(model.predict(input_example))  # Output: ['Hello', 'Hi']

# If your input doesn't match the schema, it will raise an exception
# e.g. content field is missing here, but it's required in the Message definition
model.predict([{"role": "system"}])
# Output: 1 validation error for Message\ncontent\n  Field required [type=missing, input_value={'role': 'system'}, input_type=dict]

# The same data validation works if you log and load the model as pyfunc
model_info = mlflow.pyfunc.log_model(
    artifact_path="model",
    python_model=model,
    input_example=input_example,
)
pyfunc_model = mlflow.pyfunc.load_model(model_info.model_uri)
print(pyfunc_model.predict(input_example))

对于可调用对象，您可以使用 @pyfunc 装饰器来启用基于类型提示的数据验证。

from mlflow.pyfunc.utils import pyfunc


@pyfunc
def predict(model_input: list[Message]) -> list[str]:
    return [msg.content for msg in model_input]


# The input_example can be a list of Message objects as defined in the type hint
input_example = [
    Message(role="system", content="Hello"),
    Message(role="user", content="Hi"),
]
print(predict(input_example))  # Output: ['Hello', 'Hi']

# The input_example can also be a list of dict with the same schema as Message
input_example = [
    {"role": "system", "content": "Hello"},
    {"role": "user", "content": "Hi"},
]
print(predict(input_example))  # Output: ['Hello', 'Hi']

# If your input doesn't match the schema, it will raise an exception
# e.g. passing a list of string here will raise an exception
predict(["hello"])
# Output: Failed to validate data against type hint `list[Message]`, invalid elements:
# [('hello', "Expecting example to be a dictionary or pydantic model instance for Pydantic type hint, got <class 'str'>")]

注意

MLflow 不会根据类型提示验证模型输出，但输出类型提示用于模型签名的推断。

Pydantic 模型类型提示数据转换

对于 Pydantic 模型类型提示，输入数据可以是 Pydantic 对象，也可以是与 Pydantic 模型 schema 匹配的字典。MLflow 会自动将提供的数据转换为类型提示对象，然后将其传递给 predict 函数。与上一节的示例相比，[{"role": "system", "content": "Hello"}] 在 predict 函数内部被转换为 [Message(role="system", content="Hello")]。

下面的示例演示了如何使用基类作为类型提示，同时保留子类中的字段。

from pydantic import BaseModel, ConfigDict
from mlflow.pyfunc.utils import pyfunc


class BaseMessage(BaseModel):
    # set extra='allow' to allow extra fields in the subclass
    model_config = ConfigDict(extra="allow")

    role: str
    content: str


class SystemMessage(BaseMessage):
    system_prompt: str


class UserMessage(BaseMessage):
    user_prompt: str


@pyfunc
def predict(model_input: list[BaseMessage]) -> list[str]:
    result = []
    for msg in model_input:
        if hasattr(msg, "system_prompt"):
            result.append(msg.system_prompt)
        elif hasattr(msg, "user_prompt"):
            result.append(msg.user_prompt)
    return result


input_example = [
    {"role": "system", "content": "Hello", "system_prompt": "Hi"},
    {"role": "user", "content": "Hi", "user_prompt": "Hello"},
]
print(predict(input_example))  # Output: ['Hi', 'Hello']

基于类型提示的模型签名推断

当记录带有类型提示的 PythonModel 时，MLflow 会根据模型中定义的类型提示自动推断模型的输入和输出 schema。

注意

记录带有类型提示的 PythonModel 时，不要显式传递 signature 参数。如果传递了 signature 参数，MLflow 仍然会使用基于类型提示推断的签名，并且如果它们不匹配，则会发出警告。

下表说明了类型提示如何映射到模型签名中给定的 schema：

类型提示	推断的 schema
list[str]	Schema([ColSpec(type=DataType.string)])
list[list[str]]	Schema([ColSpec(type=Array(DataType.string))])
list[dict[str, str]]	Schema([ColSpec(type=Map(DataType.string))])
list[Union[int, str]]	Schema([ColSpec(type=AnyType())])
list[Any]	Schema([ColSpec(type=AnyType())])
list[pydantic.BaseModel]	Schema([ColSpec(type=Object([...]))]) # 基于 pydantic 模型字段的属性

警告

Pydantic 对象不能用于 infer_signature 函数。要将 pydantic 对象用作模型输入，您必须在 PythonModel 的 predict 函数签名中将类型提示定义为 pydantic 模型。

模型记录时类型提示和输入示例一起使用

记录 PythonModel 时，建议提供一个与模型中定义的类型提示匹配的输入示例。输入示例用于验证类型提示并检查 predict 函数是否按预期工作。

import mlflow

mlflow.pyfunc.log_model(
    artifact_path="model",
    python_model=CustomModel(),
    input_example=["a", "b", "c"],
)

查询托管 PythonModel（带有类型提示）的 Serving Endpoint

查询托管带有类型提示的 PythonModel 的 Serving Endpoint 时，您必须在请求正文中通过 inputs 键传递输入数据。以下示例演示了如何在本地提供模型并查询它：

mlflow models serve -m runs:/<run_id>/model --env-manager local
curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json' -d '{"inputs": [{"role": "system", "content": "Hello"}]}'

额外允许但不用于数据验证或 Schema 推断的类型提示

MLflow 还支持在 PythonModel 中使用以下类型提示，但它们不用于数据验证或 schema 推断，因此在模型记录时需要提供有效的模型签名或 input_example。

pandas.DataFrame
pandas.Series
numpy.ndarray
scipy.sparse.csc_matrix
scipy.sparse.csr_matrix

TypeFromExample 类型提示用法

MLflow 提供了一个特殊的类型提示 TypeFromExample，它有助于在 PyFunc 预测期间将输入数据转换为与输入示例类型匹配。如果您不想显式定义模型输入的类型提示，但仍希望数据在预测期间符合输入示例类型，这将非常有用。要使用此功能，在模型记录期间必须提供有效的输入示例。由于 predict 函数期望批量输入数据，输入示例必须是以下类型之一：

list
pandas.DataFrame
pandas.Series

以下示例演示了如何使用 TypeFromExample 类型提示：

import mlflow
from mlflow.types.type_hints import TypeFromExample


class Model(mlflow.pyfunc.PythonModel):
    def predict(self, model_input: TypeFromExample):
        return model_input


with mlflow.start_run():
    model_info = mlflow.pyfunc.log_model(
        artifact_path="model",
        python_model=Model(),
        input_example=["a", "b", "c"],
    )
pyfunc_model = mlflow.pyfunc.load_model(model_info.model_uri)
assert pyfunc_model.predict(["d", "e", "f"]) == ["d", "e", "f"]

警告

如果既不使用类型提示也不使用 TypeFromExample，MLflow 的 schema 强制默认会将输入数据转换为 pandas DataFrame。如果模型期望与输入示例相同的类型，这可能不是理想的情况。强烈建议使用支持的类型提示以避免这种转换，并启用基于指定类型提示的数据验证。

MLflow PythonModel 简介​

定义自定义 PythonModel​

选项 1：继承 PythonModel​

选项 2：定义一个可调用对象​

记录模型​

部署前验证模型​

部署模型​

PythonModel 中的类型提示用法​

支持的类型提示​

在 PythonModel 中使用类型提示​

PythonModel 中的类型提示数据验证​

Pydantic 模型类型提示数据转换​

基于类型提示的模型签名推断​

模型记录时类型提示和输入示例一起使用​

查询托管 PythonModel（带有类型提示）的 Serving Endpoint​

额外允许但不用于数据验证或 Schema 推断的类型提示​

TypeFromExample 类型提示用法​