MLflow PythonModel 指南

MLflow PythonModel 简介

mlflow.pyfunc 模块提供了 save_model() 和

log_model() 工具，用于创建具有 python_function 风格的 MLflow 模型，这些模型包含用户指定的代码和 *artifact*（文件）依赖项。

MLflow PythonModel 使您能够实现自定义模型逻辑，同时利用 MLflow 的打包和部署功能。

有两种方法可以定义 PythonModel：子类化 mlflow.pyfunc.PythonModel() 或定义一个可调用对象。本指南提供了关于如何定义和使用自定义 PythonModel 的完整教程。

定义自定义 PythonModel

选项 1：子类化 PythonModel

mlflow.pyfunc 模块提供了一个通用 PythonModel 类，可用于定义您自己的自定义模型。通过子类化，该模型可以与其他 MLflow 组件无缝集成。

PythonModel 的方法

predict 一个有效的 PythonModel 必须实现 predict 方法，该方法定义了模型的预测逻辑。当模型使用 mlflow.pyfunc.load_model 作为 PyFunc 模型加载并调用 predict 函数时，MLflow 会调用此方法。
predict_stream 如果模型旨在用于流式环境，则应实现 predict_stream 方法。当模型使用 mlflow.pyfunc.load_model 作为 PyFunc 模型加载并调用 predict_stream 时，MLflow 会调用此方法。
load_context 如果模型需要加载额外的上下文，请实现 load_context 方法。有关更多详细信息，请参阅 load_context()。

提示

从 MLflow 2.20.0 开始，如果 `context` 参数未使用，则可以将其从 `predict` 和 `predict_stream` 函数中删除。例如，`def predict(self, model_input, params)` 是一个有效的预测函数签名。

下面是一个简单的 PythonModel 示例，它接受一个字符串列表并返回该列表。

import mlflow


class MyModel(mlflow.pyfunc.PythonModel):
    def predict(self, model_input: list[str], params=None) -> list[str]:
        return model_input

选项 2：定义可调用对象

记录 PythonModel 的另一种方法是定义一个**接受单个参数**并返回预测的可调用对象。可以通过将其传递给 mlflow.pyfunc.log_model 来将其记录为 PythonModel。

提示

从 MLflow 2.20.0 开始，您可以在可调用对象上使用 @pyfunc 装饰器，以根据类型提示对输入启用数据验证。有关更多详细信息，请查看PythonModel 中的类型提示用法。

from mlflow.pyfunc.utils import pyfunc


@pyfunc
def predict(model_input: list[str]) -> list[str]:
    return model_input

记录模型

使用 pyfunc 模块通过 mlflow.pyfunc.log_model() 记录自定义模型。

import mlflow

with mlflow.start_run():
    model_info = mlflow.pyfunc.log_model(
        name="model",
        python_model=MyModel(),
        input_example=input_example,
    )

部署前验证模型

使用 mlflow.models.predict() API 在部署模型之前验证模型依赖项和输入数据。有关更多详细信息，请查看MLflow 模型验证。

import mlflow

mlflow.models.predict(
    model_uri=model_info.model_uri,
    input_data=["a", "b", "c"],
    env_manager="uv",
)

此外，您可以在本地加载模型并通过运行预测来验证它。

import mlflow

pyfunc_model = mlflow.pyfunc.load_model(model_info.model_uri)
pyfunc_model.predict(["hello", "world"])

部署模型

在生产环境中使用模型的最后一步是部署它。请按照MLflow 模型部署指南部署模型。

PythonModel 中的类型提示用法

从 MLflow 2.20.0 开始，类型提示现在是定义模型接口的有效方式。您可以使用类型提示来定义模型的输入和输出类型。利用类型提示具有以下优点

数据验证：MLflow 根据模型中定义的类型提示验证输入数据。无论模型是 PythonModel 实例还是加载的 PyFunc 模型，输入数据都会得到一致的验证。
类型提示推断：MLflow 根据模型中定义的类型提示推断模型的输入和输出模式，并将该推断结构设置为记录的模型签名。

支持的类型提示

PythonModel 输入签名中使用的类型提示必须是 list[...] 类型，因为 PythonModel 的预测函数需要批量输入数据。以下类型提示支持作为 list[...] 的元素类型：

原始类型：int, float, str, bool, bytes, datetime.datetime
集合类型：list, dict
联合类型：Union[type1, type2, ...] 或 type1 | type2 | ...
可选类型：Optional[type]
Pydantic 模型：pydantic.BaseModel 的子类（字段必须是本节中提及的支持类型）
typing.Any：Any

类型提示用法的限制

Pydantic 模型：可选字段必须包含默认值。
联合类型：多个有效类型的联合在 MLflow 中被推断为 AnyType，并且 MLflow 不会基于它进行数据验证。
可选类型：可选类型不能直接用于 list[...]，因为预测函数的输入不应为 None。

下面是一些支持的类型提示示例

list[str], list[int], list[float], list[bool], list[bytes], list[datetime.datetime]
list[list[str]]...
list[dict[str, str]], list[dict[str, int]], list[dict[str, list[str]]]...
list[Union[int, str]], list[str | dict[str, int]]...

下面是嵌套 Pydantic 模型作为类型提示的示例

from mlflow.pyfunc.utils import pyfunc
import pydantic
from typing import Optional


class Message(pydantic.BaseModel):
    role: str
    content: str


class FunctionParams(pydantic.BaseModel):
    properties: dict[str, str]
    type: str = "object"
    required: Optional[list[str]] = None
    additionalProperties: Optional[bool] = None


class ToolDefinition(pydantic.BaseModel):
    name: str
    description: Optional[str] = None
    parameters: Optional[FunctionParams] = None
    strict: Optional[bool] = None


class ChatRequest(pydantic.BaseModel):
    messages: list[Message]
    tool: Optional[ToolDefinition] = None


@pyfunc
def predict(model_input: list[ChatRequest]) -> list[list[str]]:
    return [[msg.content for msg in request.messages] for request in model_input]


input_example = [ChatRequest(messages=[Message(role="user", content="Hello")])]
print(predict(input_example))  # Output: [['Hello']]

在 PythonModel 中使用类型提示

要在 PythonModel 中使用类型提示，您可以在 predict 函数签名中定义输入和输出类型。下面是一个 PythonModel 示例，它接受一个 Message 对象列表并返回一个字符串列表。

import pydantic
import mlflow


class Message(pydantic.BaseModel):
    role: str
    content: str


class CustomModel(mlflow.pyfunc.PythonModel):
    def predict(self, model_input: list[Message], params=None) -> list[str]:
        return [msg.content for msg in model_input]

PythonModel 中的类型提示数据验证

通过子类化 mlflow.pyfunc.PythonModel()，您可以免费获得基于类型提示的数据验证。数据验证适用于 PythonModel 实例和加载的 PyFunc 模型。

下面的示例演示了数据验证如何根据上面定义的 CustomModel 工作。

model = CustomModel()

# The input_example can be a list of Message objects as defined in the type hint
input_example = [
    Message(role="system", content="Hello"),
    Message(role="user", content="Hi"),
]
print(model.predict(input_example))  # Output: ['Hello', 'Hi']

# The input_example can also be a list of dict with the same schema as Message
input_example = [
    {"role": "system", "content": "Hello"},
    {"role": "user", "content": "Hi"},
]
print(model.predict(input_example))  # Output: ['Hello', 'Hi']

# If your input doesn't match the schema, it will raise an exception
# e.g. content field is missing here, but it's required in the Message definition
model.predict([{"role": "system"}])
# Output: 1 validation error for Message\ncontent\n  Field required [type=missing, input_value={'role': 'system'}, input_type=dict]

# The same data validation works if you log and load the model as pyfunc
model_info = mlflow.pyfunc.log_model(
    name="model",
    python_model=model,
    input_example=input_example,
)
pyfunc_model = mlflow.pyfunc.load_model(model_info.model_uri)
print(pyfunc_model.predict(input_example))

对于可调用对象，您可以使用 @pyfunc 装饰器来启用基于类型提示的数据验证。

from mlflow.pyfunc.utils import pyfunc


@pyfunc
def predict(model_input: list[Message]) -> list[str]:
    return [msg.content for msg in model_input]


# The input_example can be a list of Message objects as defined in the type hint
input_example = [
    Message(role="system", content="Hello"),
    Message(role="user", content="Hi"),
]
print(predict(input_example))  # Output: ['Hello', 'Hi']

# The input_example can also be a list of dict with the same schema as Message
input_example = [
    {"role": "system", "content": "Hello"},
    {"role": "user", "content": "Hi"},
]
print(predict(input_example))  # Output: ['Hello', 'Hi']

# If your input doesn't match the schema, it will raise an exception
# e.g. passing a list of string here will raise an exception
predict(["hello"])
# Output: Failed to validate data against type hint `list[Message]`, invalid elements:
# [('hello', "Expecting example to be a dictionary or pydantic model instance for Pydantic type hint, got <class 'str'>")]

注意

MLflow 不会根据类型提示验证模型输出，但输出类型提示用于模型签名推断。

Pydantic 模型类型提示数据转换

对于 Pydantic 模型类型提示，输入数据可以是 Pydantic 对象，也可以是与 Pydantic 模型模式匹配的字典。MLflow 在将提供的数据传递给 predict 函数之前，会自动将其转换为类型提示对象。如果与上一节中的示例进行比较，[{"role": "system", "content": "Hello"}] 在 predict 函数中会转换为 [Message(role="system", content="Hello")]。

下面的示例演示了如何使用基类作为类型提示，同时保留子类中的字段。

from pydantic import BaseModel, ConfigDict
from mlflow.pyfunc.utils import pyfunc


class BaseMessage(BaseModel):
    # set extra='allow' to allow extra fields in the subclass
    model_config = ConfigDict(extra="allow")

    role: str
    content: str


class SystemMessage(BaseMessage):
    system_prompt: str


class UserMessage(BaseMessage):
    user_prompt: str


@pyfunc
def predict(model_input: list[BaseMessage]) -> list[str]:
    result = []
    for msg in model_input:
        if hasattr(msg, "system_prompt"):
            result.append(msg.system_prompt)
        elif hasattr(msg, "user_prompt"):
            result.append(msg.user_prompt)
    return result


input_example = [
    {"role": "system", "content": "Hello", "system_prompt": "Hi"},
    {"role": "user", "content": "Hi", "user_prompt": "Hello"},
]
print(predict(input_example))  # Output: ['Hi', 'Hello']

基于类型提示的模型签名推断

当记录带有类型提示的 PythonModel 时，MLflow 会根据模型中定义的类型提示自动推断模型的输入和输出模式。

注意

在记录带有类型提示的 PythonModel 时，不要显式传递 signature 参数。如果您传递了 signature 参数，MLflow 仍将使用基于类型提示推断的签名，如果它们不匹配，则会发出警告。

下表说明了类型提示如何映射到模型签名中给定的模式

类型提示	推断模式
`list[str]`	Schema([ColSpec(type=DataType.string)])
`list[list[str]]`	Schema([ColSpec(type=Array(DataType.string))])
`list[dict[str, str]]`	Schema([ColSpec(type=Map(DataType.string))])
`list[Union[int, str]]`	Schema([ColSpec(type=AnyType())])
`list[Any]`	Schema([ColSpec(type=AnyType())])
`list[pydantic.BaseModel]`	Schema([ColSpec(type=Object([...]))]) # properties based on the pydantic model fields

警告

Pydantic 对象不能用于 infer_signature 函数。要将 Pydantic 对象用作模型输入，您必须在 PythonModel 的 predict 函数签名中将类型提示定义为 Pydantic 模型。

模型记录时输入示例与类型提示结合使用

当记录 PythonModel 时，建议提供与模型中定义的类型提示匹配的输入示例。输入示例用于验证类型提示并检查 predict 函数是否按预期工作。

import mlflow

mlflow.pyfunc.log_model(
    name="model",
    python_model=CustomModel(),
    input_example=["a", "b", "c"],
)

查询托管带有类型提示的 PythonModel 的服务端点

当查询托管带有类型提示的 PythonModel 的服务端点时，您**必须在请求体中通过** inputs **键传递输入数据**。下面的示例演示了如何在本地提供模型服务并查询它

mlflow models serve -m runs:/<run_id>/model --env-manager local
curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json' -d '{"inputs": [{"role": "system", "content": "Hello"}]}'

不支持数据验证或模式推断的额外允许类型提示

MLflow 还支持在 PythonModel 中使用以下类型提示，但它们不用于数据验证或模式推断，并且在模型记录期间需要提供有效的模型签名或输入示例。

pandas.DataFrame
pandas.Series
numpy.ndarray
scipy.sparse.csc_matrix
scipy.sparse.csr_matrix

TypeFromExample 类型提示用法

MLflow 提供了一种特殊的类型提示 TypeFromExample，它有助于在 PyFunc 预测期间将输入数据转换为与您的输入示例类型匹配。如果您不想明确定义模型输入的类型提示，但仍希望数据在预测期间符合输入示例类型，这将非常有用。**要使用此功能，在模型记录期间必须提供有效的输入示例。** 输入示例必须是以下类型之一，因为 predict 函数需要批量输入数据

list
pandas.DataFrame
pandas.Series

下面的示例演示了如何使用 TypeFromExample 类型提示

import mlflow
from mlflow.types.type_hints import TypeFromExample


class Model(mlflow.pyfunc.PythonModel):
    def predict(self, model_input: TypeFromExample):
        return model_input


with mlflow.start_run():
    model_info = mlflow.pyfunc.log_model(
        name="model",
        python_model=Model(),
        input_example=["a", "b", "c"],
    )
pyfunc_model = mlflow.pyfunc.load_model(model_info.model_uri)
assert pyfunc_model.predict(["d", "e", "f"]) == ["d", "e", "f"]

警告

如果既不使用类型提示也不使用 TypeFromExample，MLflow 的模式强制将默认把输入数据转换为 pandas DataFrame。如果模型期望与输入示例相同的类型，这可能不理想。强烈建议使用支持的类型提示，以避免这种转换并启用基于指定类型提示的数据验证。

MLflow PythonModel 简介​

定义自定义 PythonModel​

选项 1：子类化 PythonModel​

选项 2：定义可调用对象​

记录模型​

部署前验证模型​

部署模型​

PythonModel 中的类型提示用法​

支持的类型提示​

在 PythonModel 中使用类型提示​

PythonModel 中的类型提示数据验证​

Pydantic 模型类型提示数据转换​

基于类型提示的模型签名推断​

模型记录时输入示例与类型提示结合使用​

查询托管带有类型提示的 PythonModel 的服务端点​

不支持数据验证或模式推断的额外允许类型提示​

TypeFromExample 类型提示用法​