MLflow 自定义 Pyfunc 简介

在本篇 MLflow 自定义 Pyfunc 的入门教程中，我们将深入探讨 PythonModel 类的核心特性，并探索如何使用它们构建一个非常简单的模型，该模型可以保存、加载并用于推理。

目标：在本指南结束时，您将学习如何

使用 Python 类定义自定义 PyFunc 模型。
理解 PyFunc 风味的核心组件。
保存、加载和使用自定义 PyFunc 模型进行预测。
利用 MLflow 的 PyFunc 解决一个实际示例：利萨茹曲线。

`PythonModel` 类

MLflow 的通用模型实例类型方法采取严格的标准化方式，以确保只要遵循实现指南，任何使用 MLflow 存储的模型都可以用于推理。

有两种方法可以创建自定义 PythonModel 实例。第一种方法，也是本指南将使用的方法，涉及定义一个类和将用于接口的方法。还有另一种方法，即定义一个名为 predict 的函数，并将其作为 mlflow.pyfunc.save_model() 中的 python_model 参数进行日志记录。这种方法更有限，但对于可以将整个预测逻辑封装在单个函数中的实现来说，它更可取。对于第二种 Pyfunc 日志记录模式，MLflow 将为您创建并记录一个通用的 PythonModel 类，并将您提供的 predict 函数作为类中的 predict() 方法添加。

核心 PythonModel 组件

MLflow 的 PyFunc 围绕着 PythonModel 类。这个类中有两个基本方法

load_context(self, context)：此方法用于加载 artifacts 或执行其他初始化任务。它是可选的，可用于获取外部引用。
predict(self, context, model_input, params=None)：这是模型进行预测时的入口点。您的自定义 PyFunc 模型必须定义此方法。

例如，如果您的模型使用 XGBoost 等外部库，您可以在 load_context 方法中加载 XGBoost 模型，并在 predict 方法中使用它。

PythonModel 的基本指南

这种方法的要求如下

您的类必须是 mlflow.pyfunc.PythonModel 的子类
您的类必须实现 predict 方法
predict 方法必须符合 Inference API 的要求。
predict 方法必须将 context 作为第一个命名参数
如果您希望为模型提供参数，这些参数必须定义为模型签名的一部分。签名必须与模型一起保存。
如果您打算在加载模型时执行附加功能（例如加载附加依赖文件），您可以决定在类中定义 load_context 方法。

定义一个简单的 Python 模型

在本教程中，我们将不介绍更高级的 load_context 或与 predict 方法中的 context 参数交互。为了理解自定义 PythonModel 的最基本方面，我们将保持简单。

为了展示 MLflow 自定义 Pyfunc 模型的附加用法，我们将不着眼于典型的库使用案例。相反，我们将研究如何使用 MLflow 来存储 Lissajous 实现的配置实例。

利萨茹曲线

起源于谐波领域的利萨茹曲线是由以下参数定义的正弦曲线：

$$ x(t) = A \sin(a t + \delta) $$ $$ y(t) = B \sin(b t) $$

其中

( $A$ ) 和 ( $B$ ) 分别是曲线沿 x 轴和 y 轴的振幅。
( $a$ ) 和 ( $b$ ) 决定了振荡频率。
( $\delta$ ) 是 x 分量和 y 分量之间的相位差。

我们将创建一个简单的模型，允许用户生成与频率振荡比率及其相位相关的不同图案。

步骤 1：定义自定义 PyFunc 模型

我们首先为自定义模型定义一个 Python 类。该类应继承自 mlflow.pyfunc.PythonModel。

在我们的利萨茹模型中，我们使用参数 ( $A$ )、( $B$ ) 和 num_points 对其进行初始化。predict 方法负责根据输入 ( $a$ )、( $b$ ) 和 ( $\delta$ ) 绘制利萨茹曲线。

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

import mlflow.pyfunc
from mlflow.models import infer_signature


class Lissajous(mlflow.pyfunc.PythonModel):
  def __init__(self, A=1, B=1, num_points=1000):
      self.A = A
      self.B = B
      self.num_points = num_points
      self.t_range = (0, 2 * np.pi)

  def generate_lissajous(self, a, b, delta):
      t = np.linspace(self.t_range[0], self.t_range[1], self.num_points)
      x = self.A * np.sin(a * t + delta)
      y = self.B * np.sin(b * t)
      return pd.DataFrame({"x": x, "y": y})

  def predict(self, context, model_input, params=None):  # noqa: D417
      """
      Generate and plot the Lissajous curve with annotations for parameters.

      Args:
      - model_input (pd.DataFrame): DataFrame containing columns 'a' and 'b'.
      - params (dict, optional): Dictionary containing optional parameter 'delta'.
      """
      # Extract a and b values from the input DataFrame
      a = model_input["a"].iloc[0]
      b = model_input["b"].iloc[0]

      # Extract delta from params or set it to 0 if not provided
      delta = params.get("delta", 0)

      # Generate the Lissajous curve data
      df = self.generate_lissajous(a, b, delta)

      sns.set_theme()

      # Create the plot components
      fig, ax = plt.subplots(figsize=(10, 8))
      ax.plot(df["x"], df["y"])
      ax.set_title("Lissajous Curve")

      # Define the annotation string
      annotation_text = f"""
      A = {self.A}
      B = {self.B}
      a = {a}
      b = {b}
      delta = {np.round(delta, 2)} rad
      """

      # Add the annotation with a bounding box outside the plot area
      ax.annotate(
          annotation_text,
          xy=(1.05, 0.5),
          xycoords="axes fraction",
          fontsize=12,
          bbox={"boxstyle": "round,pad=0.25", "facecolor": "aliceblue", "edgecolor": "black"},
      )

      # Adjust plot borders to make space for the annotation
      plt.subplots_adjust(right=0.65)
      plt.close()

      # Return the plot
      return fig

步骤 2：保存模型

定义模型类后，我们可以实例化它并使用 MLflow 保存它。这里的 infer_signature 方法非常有用，可以自动推断模型的输入和输出模式。

因为我们使用 params 来覆盖方程的 delta 值，所以我们需要在保存期间提供模型的签名。如果此处未定义签名，则加载的此模型实例（如果未保存签名）将忽略提供的参数（并发出警告）。

# Define the path to save the model
model_path = "lissajous_model"

# Create an instance of the model, overriding the default instance variables `A`, `B`, and `num_points`
model_10k_standard = Lissajous(1, 1, 10_000)

# Infer the model signature, ensuring that we define the params that will be available for customization at inference time
signature = infer_signature(pd.DataFrame([{"a": 1, "b": 2}]), params={"delta": np.pi / 5})

# Save our custom model to the path we defined, with the signature that we declared
mlflow.pyfunc.save_model(path=model_path, python_model=model_10k_standard, signature=signature)

/Users/benjamin.wilson/miniconda3/envs/mlflow-dev-env/lib/python3.8/site-packages/mlflow/models/signature.py:212: UserWarning: Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details.
inputs = _infer_schema(model_input) if model_input is not None else None
/Users/benjamin.wilson/miniconda3/envs/mlflow-dev-env/lib/python3.8/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

步骤 3：加载模型

保存后，我们可以加载模型并用于预测。在这里，我们的预测是利萨茹曲线图。

# Load our custom model from the local artifact store
loaded_pyfunc_model = mlflow.pyfunc.load_model(model_path)

步骤 4：使用模型生成曲线

# Define the input DataFrame. In our custom model, we're reading only the first row of data to generate a plot.
model_input = pd.DataFrame({"a": [3], "b": [2]})

# Define a params override for the `delta` parameter
params = {"delta": np.pi / 3}

# Run predict, which will call our internal method `generate_lissajous` before generating a `matplotlib` plot showing the curve
fig = loaded_pyfunc_model.predict(model_input, params)

# Display the plot
fig

# Try a different configuration of arguments
fig2 = loaded_pyfunc_model.predict(
  pd.DataFrame({"a": [15], "b": [17]}), params={"delta": np.pi / 5}
)

fig2

PythonModel 类​

核心 PythonModel 组件​

PythonModel 的基本指南​

定义一个简单的 Python 模型​

利萨茹曲线​

步骤 1：定义自定义 PyFunc 模型​

步骤 2：保存模型​

步骤 3：加载模型​

步骤 4：使用模型生成曲线​