MLflow 自定义 Pyfunc 入门

在本 MLflow 自定义 Pyfunc 入门教程中，我们将深入研究 PythonModel 类的核心功能，并探索如何使用它们来构建一个非常简单的模型，该模型可以保存、加载并用于推理。

目标：在本指南结束时，您将学习如何：

使用 Python 类定义自定义 PyFunc 模型。
了解 PyFunc flavor 的核心组件。
使用自定义 PyFunc 模型进行保存、加载和预测。
利用 MLflow 的 PyFunc 的强大功能，用于一个实际的例子：Lissajous 曲线。

`PythonModel` 类

MLflow 用于通用模型实例类型的方法采用严格的标准化方法，以确保使用 MLflow 存储的任何模型都可以用于推理，前提是遵守实现指南。

有两种方法可以创建自定义 PythonModel 实例。第一种，也是我们将在本指南中使用的，涉及定义一个类和将用于接口的方法。还有另一种方法可以使用，即定义一个名为 predict 的函数，并在 mlflow.pyfunc.save_model() 中将其记录为 python_model 参数。这种方法更有限，但对于预测的整个逻辑可以封装在单个函数中的实现，它是更可取的。对于第二种 pyfunc 记录模式，将为您创建一个通用的 PythonModel 类并记录，并将您提供的 predict 函数添加为类中的 predict() 方法。

核心 PythonModel 组件

MLflow 的 PyFunc 围绕 PythonModel 类展开。此类中的两个基本方法是：

load_context(self, context)：此方法用于加载 artifacts 或其他初始化任务。它是可选的，可用于获取外部引用。
predict(self, context, model_input, params=None)：这是进行预测时模型的入口点。必须为您的自定义 PyFunc 模型定义它。

例如，如果您的模型使用像 XGBoost 这样的外部库，您可以在 load_context 方法中加载 XGBoost 模型，并在 predict 方法中使用它。

PythonModel 的基本准则

这种方法的准则如下：

您的类必须是 mlflow.pyfunc.PythonModel 的子类
您的类必须实现 predict 方法
predict 方法必须符合Inference API的要求。
predict 方法必须引用 context 作为第一个命名参数
如果您希望为模型提供参数，则必须将这些参数定义为模型签名的一部分。签名必须与模型一起保存。
如果您打算在加载模型时执行其他功能（例如加载其他依赖文件），您可以决定在类中定义 load_context 方法。

定义一个简单的 Python 模型

在本教程中，我们将不介绍更高级的 load_context 或与 predict 方法中的 context 参数交互。为了理解自定义 PythonModel 的最基本方面，我们将保持简单。

为了展示 MLflow 自定义 Pyfunc 模型的其他用法，我们将不考虑典型的库用例。相反，我们将了解如何使用 MLflow 来存储 Lissajous 实现的已配置实例。

Lissajous 曲线

Lissajous 曲线起源于谐波领域，是由以下公式定义的参数正弦曲线：

$$ x(t) = A \sin(a t + \delta) $$ $$ y(t) = B \sin(b t) $$

其中

（$A$）和（$B$）分别是曲线沿 x 轴和 y 轴的幅度。
（$a$）和（$b$）确定振荡的频率。
（$\delta$）是 x 和 y 分量之间的相位差。

我们将创建一个简单的模型，该模型允许用户生成与频率振荡的比率及其相位相关的不同模式。

步骤 1：定义自定义 PyFunc 模型

我们首先为自定义模型定义一个 Python 类。该类应继承自 mlflow.pyfunc.PythonModel。

在我们的 Lissajous 模型中，我们使用参数（$A$）、（$B$）和 num_points 初始化它。 predict 方法负责根据输入（$a$）、（$b$）和（$\delta$）绘制 Lissajous 曲线。

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

import mlflow.pyfunc
from mlflow.models import infer_signature


class Lissajous(mlflow.pyfunc.PythonModel):
  def __init__(self, A=1, B=1, num_points=1000):
      self.A = A
      self.B = B
      self.num_points = num_points
      self.t_range = (0, 2 * np.pi)

  def generate_lissajous(self, a, b, delta):
      t = np.linspace(self.t_range[0], self.t_range[1], self.num_points)
      x = self.A * np.sin(a * t + delta)
      y = self.B * np.sin(b * t)
      return pd.DataFrame({"x": x, "y": y})

  def predict(self, context, model_input, params=None):  # noqa: D417
      """
      Generate and plot the Lissajous curve with annotations for parameters.

      Args:
      - model_input (pd.DataFrame): DataFrame containing columns 'a' and 'b'.
      - params (dict, optional): Dictionary containing optional parameter 'delta'.
      """
      # Extract a and b values from the input DataFrame
      a = model_input["a"].iloc[0]
      b = model_input["b"].iloc[0]

      # Extract delta from params or set it to 0 if not provided
      delta = params.get("delta", 0)

      # Generate the Lissajous curve data
      df = self.generate_lissajous(a, b, delta)

      sns.set_theme()

      # Create the plot components
      fig, ax = plt.subplots(figsize=(10, 8))
      ax.plot(df["x"], df["y"])
      ax.set_title("Lissajous Curve")

      # Define the annotation string
      annotation_text = f"""
      A = {self.A}
      B = {self.B}
      a = {a}
      b = {b}
      delta = {np.round(delta, 2)} rad
      """

      # Add the annotation with a bounding box outside the plot area
      ax.annotate(
          annotation_text,
          xy=(1.05, 0.5),
          xycoords="axes fraction",
          fontsize=12,
          bbox={"boxstyle": "round,pad=0.25", "facecolor": "aliceblue", "edgecolor": "black"},
      )

      # Adjust plot borders to make space for the annotation
      plt.subplots_adjust(right=0.65)
      plt.close()

      # Return the plot
      return fig

步骤 2：保存模型

定义了我们的模型类后，我们可以实例化它并使用 MLflow 保存它。 infer_signature 方法在此处很有用，可以自动推断模型的输入和输出模式。

因为我们使用 params 来覆盖方程的 delta 值，所以我们需要在保存期间提供模型的签名。如果不在此处定义此值，则意味着此模型的已加载实例的用法（如果保存时没有签名）将忽略提供的参数（并发出警告）。

# Define the path to save the model
model_path = "lissajous_model"

# Create an instance of the model, overriding the default instance variables `A`, `B`, and `num_points`
model_10k_standard = Lissajous(1, 1, 10_000)

# Infer the model signature, ensuring that we define the params that will be available for customization at inference time
signature = infer_signature(pd.DataFrame([{"a": 1, "b": 2}]), params={"delta": np.pi / 5})

# Save our custom model to the path we defined, with the signature that we declared
mlflow.pyfunc.save_model(path=model_path, python_model=model_10k_standard, signature=signature)

/Users/benjamin.wilson/miniconda3/envs/mlflow-dev-env/lib/python3.8/site-packages/mlflow/models/signature.py:212: UserWarning: Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details.
inputs = _infer_schema(model_input) if model_input is not None else None
/Users/benjamin.wilson/miniconda3/envs/mlflow-dev-env/lib/python3.8/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

步骤 3：加载模型

保存后，我们可以将模型加载回去并将其用于预测。在这里，我们的预测是 Lissajous 曲线图。

# Load our custom model from the local artifact store
loaded_pyfunc_model = mlflow.pyfunc.load_model(model_path)

步骤 4：使用模型生成曲线

# Define the input DataFrame. In our custom model, we're reading only the first row of data to generate a plot.
model_input = pd.DataFrame({"a": [3], "b": [2]})

# Define a params override for the `delta` parameter
params = {"delta": np.pi / 3}

# Run predict, which will call our internal method `generate_lissajous` before generating a `matplotlib` plot showing the curve
fig = loaded_pyfunc_model.predict(model_input, params)

# Display the plot
fig

# Try a different configuration of arguments
fig2 = loaded_pyfunc_model.predict(
  pd.DataFrame({"a": [15], "b": [17]}), params={"delta": np.pi / 5}
)

fig2

PythonModel 类​

核心 PythonModel 组件​

PythonModel 的基本准则​

定义一个简单的 Python 模型​

Lissajous 曲线​

步骤 1：定义自定义 PyFunc 模型​

步骤 2：保存模型​

步骤 3：加载模型​

步骤 4：使用模型生成曲线​