mlflow.lightgbm

The mlflow.lightgbm 模块提供了用于记录和加载 LightGBM 模型的 API。此模块导出具有以下风味的 LightGBM 模型：

LightGBM（原生）格式: 这是可以加载回 LightGBM 的主要风味。
mlflow.pyfunc: Produced for use by generic pyfunc-based deployment tools and batch inference.

mlflow.lightgbm.autolog(log_input_examples=False, log_model_signatures=True, log_models=True, log_datasets=True, disable=False, exclusive=False, disable_for_unsupported_versions=False, silent=False, registered_model_name=None, extra_tags=None)[source]

注意

Autologging is known to be compatible with the following package versions: 4.2.0 <= lightgbm <= 4.6.0. Autologging may not succeed when used with package versions outside of this range.

启用（或禁用）并配置从 LightGBM 到 MLflow 的自动日志记录。记录以下内容：

在 lightgbm.train 中指定的参数。

每次迭代时的指标（如果指定了 valid_sets）。

最佳迭代时的指标（如果指定了 early_stopping_rounds 或设置了 early_stopping 回调）。

特征重要性（“split”和“gain”）作为 JSON 文件和绘图。

训练模型，包括：

一个有效输入的示例。

模型输入和输出的推断签名。

请注意，scikit-learn API 现在已得到支持。

参数

log_input_examples – 如果设置为 True，则在训练期间收集训练数据集的输入示例，并与 LightGBM 模型工件一起记录。如果设置为 False，则不记录输入示例。注意：输入示例是 MLflow 模型属性，仅在 log_models 也设置为 True 时收集。
log_model_signatures – 如果设置为 True，则在训练期间收集并与 LightGBM 模型工件一起记录描述模型输入和输出的 ModelSignatures。如果设置为 False，则不记录签名。注意：模型签名是 MLflow 模型属性，仅在 log_models 也设置为 True 时收集。
log_models – 如果为 True，则训练好的模型将作为 MLflow 模型工件进行记录。如果为 False，则不记录训练好的模型。输入样本和模型签名（MLflow 模型的属性）在 log_models 为 False 时也会被省略。
log_datasets – 如果为 True，则将训练和验证数据集信息记录到 MLflow Tracking（如果适用）。如果为 False，则不记录数据集信息。
disable – 如果设置为 True，则禁用 LightGBM 自动日志记录集成。如果设置为 False，则启用 LightGBM 自动日志记录集成。
exclusive – 如果为 True，则自动记录的内容不会记录到用户创建的流畅运行中。如果为 False，则自动记录的内容将记录到活动的流畅运行中，该运行可能是用户创建的。
disable_for_unsupported_versions – 如果设置为 True，则禁用与此 MLflow 客户端版本未经验证或不兼容的 lightgbm 版本的自动日志记录。
silent – 如果设置为 True，则在 LightGBM 自动日志记录期间抑制 MLflow 的所有事件日志和警告。如果设置为 False，则在 LightGBM 自动日志记录期间显示所有事件和警告。
registered_model_name – If given, each time a model is trained, it is registered as a new model version of the registered model with this name. The registered model is created if it does not already exist.
extra_tags – 要为自动日志记录创建的每个托管运行设置的额外标签的字典。

Example

import mlflow
from lightgbm import LGBMClassifier
from sklearn import datasets


def print_auto_logged_info(run):
    tags = {k: v for k, v in run.data.tags.items() if not k.startswith("mlflow.")}
    artifacts = [
        f.path for f in mlflow.MlflowClient().list_artifacts(run.info.run_id, "model")
    ]
    feature_importances = [
        f.path
        for f in mlflow.MlflowClient().list_artifacts(run.info.run_id)
        if f.path != "model"
    ]
    print(f"run_id: {run.info.run_id}")
    print(f"artifacts: {artifacts}")
    print(f"feature_importances: {feature_importances}")
    print(f"params: {run.data.params}")
    print(f"metrics: {run.data.metrics}")
    print(f"tags: {tags}")


# Load iris dataset
X, y = datasets.load_iris(return_X_y=True, as_frame=True)

# Initialize our model
model = LGBMClassifier(objective="multiclass", random_state=42)

# Auto log all MLflow entities
mlflow.lightgbm.autolog()

# Train the model
with mlflow.start_run() as run:
    model.fit(X, y)

# fetch the auto logged parameters and metrics
print_auto_logged_info(mlflow.get_run(run_id=run.info.run_id))

Output

run_id: e08dd59d57a74971b68cf78a724dfaf6
artifacts: ['model/MLmodel',
            'model/conda.yaml',
            'model/model.pkl',
            'model/python_env.yaml',
            'model/requirements.txt']
feature_importances: ['feature_importance_gain.json',
                      'feature_importance_gain.png',
                      'feature_importance_split.json',
                      'feature_importance_split.png']
params: {'boosting_type': 'gbdt',
         'categorical_feature': 'auto',
         'colsample_bytree': '1.0',
         ...
         'verbose_eval': 'warn'}
metrics: {}
tags: {}

mlflow.lightgbm.get_default_conda_env(include_cloudpickle=False)[source]

返回: 通过调用 save_model() 和 log_model() 生成的 MLflow Models 的默认 Conda 环境。

mlflow.lightgbm.get_default_pip_requirements(include_cloudpickle=False)[source]

返回: 此风味生成的 MLflow Models 的默认 pip requirements 列表。调用 save_model() 和 log_model() 会生成一个 pip 环境，该环境至少包含这些 requirements。

mlflow.lightgbm.load_model(model_uri, dst_path=None)[source]

从本地文件或 run 加载 LightGBM 模型。

参数

model_uri –
MLflow 模型在 URI 格式中的位置。例如：
- /Users/me/path/to/local/model
- relative/path/to/local/model
- s3://my_bucket/path/to/model
- runs:/<mlflow_run_id>/run-relative/path/to/model
有关支持的 URI 方案的更多信息，请参阅引用 Artifacts。
dst_path – The local filesystem path to which to download the model artifact. This directory must already exist. If unspecified, a local output path will be created.

返回

一个 LightGBM 模型（lightgbm.Booster 的实例）或一个 LightGBM scikit-learn 模型，具体取决于保存的模型类规范。

Example

from lightgbm import LGBMClassifier
from sklearn import datasets
import mlflow

# Auto log all MLflow entities
mlflow.lightgbm.autolog()

# Load iris dataset
X, y = datasets.load_iris(return_X_y=True, as_frame=True)

# Initialize our model
model = LGBMClassifier(objective="multiclass", random_state=42)

# Train the model
model.fit(X, y)

# Load model for inference
model_uri = f"runs:/{mlflow.last_active_run().info.run_id}/model"
loaded_model = mlflow.lightgbm.load_model(model_uri)
print(loaded_model.predict(X[:5]))

Output

[0 0 0 0 0]

mlflow.lightgbm.log_model(lgb_model, artifact_path: str | None = None, conda_env=None, code_paths=None, registered_model_name=None, signature: mlflow.models.signature.ModelSignature = None, input_example: Union[pandas.core.frame.DataFrame, numpy.ndarray, dict, list, csr_matrix, csc_matrix, str, bytes, tuple] = None, await_registration_for=300, pip_requirements=None, extra_pip_requirements=None, metadata=None, name: str | None = None, params: dict[str, typing.Any] | None = None, tags: dict[str, typing.Any] | None = None, model_type: str | None = None, step: int = 0, model_id: str | None = None, **kwargs)[source]

将 LightGBM 模型作为 MLflow artifact 记录到当前 run 中。

参数

lgb_model – 要保存的 LightGBM 模型（lightgbm.Booster 的实例）或实现 scikit-learn API 的模型。
artifact_path – Deprecated. Use name instead.
conda_env –
Conda 环境的字典表示形式或 conda 环境 yaml 文件的路径。如果提供了该参数，它将描述模型应在其中运行的环境。至少，它应指定 get_default_conda_env() 中包含的依赖项。如果为 None，则模型将添加一个由 mlflow.models.infer_pip_requirements() 推断的 pip requirements 组成的 Conda 环境。如果推断失败，则回退使用 get_default_pip_requirements。来自 conda_env 的 pip requirements 将被写入 pip requirements.txt 文件，整个 conda 环境将写入 conda.yaml。下面是一个 Conda 环境的字典表示形式的示例。
```
{
    "name": "mlflow-env",
    "channels": ["conda-forge"],
    "dependencies": [
        "python=3.8.15",
        {
            "pip": [
                "lightgbm==x.y.z"
            ],
        },
    ],
}
```
code_paths –
A list of local filesystem paths to Python file dependencies (or directories containing file dependencies). These files are prepended to the system path when the model is loaded. Files declared as dependencies for a given model should have relative imports declared from a common root path if multiple files are defined with import dependencies between them to avoid import errors when loading the model.

For a detailed explanation of code_paths functionality, recommended usage patterns and limitations, see the code_paths usage guide.
registered_model_name – 如果提供，则在 registered_model_name 下创建一个模型版本，如果给定名称的注册模型不存在，也会创建该注册模型。
signature –
一个 ModelSignature 类的实例，该类描述了模型的输入和输出。如果未指定但提供了 input_example，则将根据提供的输入示例和模型自动推断签名。要禁用在提供输入示例时自动推断签名，请将 signature 设置为 False。要手动推断模型签名，请在具有有效模型输入的的数据集上调用 infer_signature()（例如，省略目标列的训练数据集），以及有效的模型输出（例如，在训练数据集上进行的模型预测），例如：
```
from mlflow.models import infer_signature

train = df.drop_column("target_label")
predictions = ...  # compute model predictions
signature = infer_signature(train, predictions)
```
input_example – 一个或多个有效的模型输入实例。输入示例用作要馈送给模型的数据的提示。它将被转换为 Pandas DataFrame，然后使用 Pandas 的面向拆分（split-oriented）格式序列化为 json，或者转换为 numpy 数组，其中示例将通过转换为列表来序列化为 json。字节将进行 base64 编码。当 signature 参数为 None 时，输入示例用于推断模型签名。
await_registration_for – 等待模型版本完成创建并处于 READY 状态的秒数。默认情况下，函数等待五分钟。指定 0 或 None 可跳过等待。
pip_requirements – pip requirement 字符串的可迭代对象（例如 ["lightgbm", "-r requirements.txt", "-c constraints.txt"]）或本地文件系统上 pip requirements 文件的字符串路径（例如 "requirements.txt"）。如果提供了该参数，它将描述模型应在其中运行的环境。如果为 None，则由 mlflow.models.infer_pip_requirements() 从当前软件环境中推断出默认 requirements 列表。如果推断失败，则回退使用 get_default_pip_requirements。requirements 和 constraints 都将自动解析并写入 requirements.txt 和 constraints.txt 文件，并作为模型的一部分存储。requirements 也会被写入模型 Conda 环境 (conda.yaml) 文件的 pip 部分。
extra_pip_requirements –
pip requirement 字符串的可迭代对象（例如 ["pandas", "-r requirements.txt", "-c constraints.txt"]）或本地文件系统上 pip requirements 文件的字符串路径（例如 "requirements.txt"）。如果提供了该参数，它将描述附加到根据用户当前软件环境自动生成的默认 pip requirements 集的附加 pip requirements。requirements 和 constraints 都将自动解析并写入 requirements.txt 和 constraints.txt 文件，并作为模型的一部分存储。requirements 也会被写入模型 Conda 环境 (conda.yaml) 文件的 pip 部分。
警告

以下参数不能同时指定
- conda_env
- pip_requirements
- extra_pip_requirements
此示例演示了如何使用 pip_requirements 和 extra_pip_requirements 指定 pip requirements。
metadata – 传递给模型并存储在 MLmodel 文件中的自定义元数据字典。
name – 模型名称。
params – 要与模型一起记录的参数字典。
tags – 要与模型一起记录的标签字典。
model_type – 模型的类型。
step – 记录模型输出和指标的步骤
model_id – 模型的 ID。
kwargs – 传递给 lightgbm.Booster.save_model 方法的 kwargs。

返回

一个 ModelInfo 实例，其中包含已记录模型的元数据。

Example

from lightgbm import LGBMClassifier
from sklearn import datasets
import mlflow
from mlflow.models import infer_signature

# Load iris dataset
X, y = datasets.load_iris(return_X_y=True, as_frame=True)

# Initialize our model
model = LGBMClassifier(objective="multiclass", random_state=42)

# Train the model
model.fit(X, y)

# Create model signature
predictions = model.predict(X)
signature = infer_signature(X, predictions)

# Log the model
artifact_path = "model"
with mlflow.start_run():
    model_info = mlflow.lightgbm.log_model(
        model, name=artifact_path, signature=signature
    )

# Fetch the logged model artifacts
print(f"run_id: {run.info.run_id}")
client = mlflow.MlflowClient()
artifacts = [f.path for f in client.list_artifacts(run.info.run_id, artifact_path)]
print(f"artifacts: {artifacts}")

输出

artifacts: ['model/MLmodel',
            'model/conda.yaml',
            'model/model.pkl',
            'model/python_env.yaml',
            'model/requirements.txt']

mlflow.lightgbm.save_model(lgb_model, path, conda_env=None, code_paths=None, mlflow_model=None, signature: mlflow.models.signature.ModelSignature = None, input_example: Union[pandas.core.frame.DataFrame, numpy.ndarray, dict, list, csr_matrix, csc_matrix, str, bytes, tuple] = None, pip_requirements=None, extra_pip_requirements=None, metadata=None)[source]

将 LightGBM 模型保存到本地文件系统上的一个路径。

参数

lgb_model – 要保存的 LightGBM 模型（lightgbm.Booster 的实例）或实现 scikit-learn API 的模型。
path – 要保存模型的本地路径。
conda_env –
Conda 环境的字典表示形式或 conda 环境 yaml 文件的路径。如果提供了该参数，它将描述模型应在其中运行的环境。至少，它应指定 get_default_conda_env() 中包含的依赖项。如果为 None，则模型将添加一个由 mlflow.models.infer_pip_requirements() 推断的 pip requirements 组成的 Conda 环境。如果推断失败，则回退使用 get_default_pip_requirements。来自 conda_env 的 pip requirements 将被写入 pip requirements.txt 文件，整个 conda 环境将写入 conda.yaml。下面是一个 Conda 环境的字典表示形式的示例。
```
{
    "name": "mlflow-env",
    "channels": ["conda-forge"],
    "dependencies": [
        "python=3.8.15",
        {
            "pip": [
                "lightgbm==x.y.z"
            ],
        },
    ],
}
```
code_paths –
A list of local filesystem paths to Python file dependencies (or directories containing file dependencies). These files are prepended to the system path when the model is loaded. Files declared as dependencies for a given model should have relative imports declared from a common root path if multiple files are defined with import dependencies between them to avoid import errors when loading the model.

For a detailed explanation of code_paths functionality, recommended usage patterns and limitations, see the code_paths usage guide.
mlflow_model – 要添加此 flavor 的 mlflow.models.Model。
signature –
一个 ModelSignature 类的实例，该类描述了模型的输入和输出。如果未指定但提供了 input_example，则将根据提供的输入示例和模型自动推断签名。要禁用在提供输入示例时自动推断签名，请将 signature 设置为 False。要手动推断模型签名，请在具有有效模型输入的的数据集上调用 infer_signature()（例如，省略目标列的训练数据集），以及有效的模型输出（例如，在训练数据集上进行的模型预测），例如：
```
from mlflow.models import infer_signature

train = df.drop_column("target_label")
predictions = ...  # compute model predictions
signature = infer_signature(train, predictions)
```
input_example – 一个或多个有效的模型输入实例。输入示例用作要馈送给模型的数据的提示。它将被转换为 Pandas DataFrame，然后使用 Pandas 的面向拆分（split-oriented）格式序列化为 json，或者转换为 numpy 数组，其中示例将通过转换为列表来序列化为 json。字节将进行 base64 编码。当 signature 参数为 None 时，输入示例用于推断模型签名。
pip_requirements – pip requirement 字符串的可迭代对象（例如 ["lightgbm", "-r requirements.txt", "-c constraints.txt"]）或本地文件系统上 pip requirements 文件的字符串路径（例如 "requirements.txt"）。如果提供了该参数，它将描述模型应在其中运行的环境。如果为 None，则由 mlflow.models.infer_pip_requirements() 从当前软件环境中推断出默认 requirements 列表。如果推断失败，则回退使用 get_default_pip_requirements。requirements 和 constraints 都将自动解析并写入 requirements.txt 和 constraints.txt 文件，并作为模型的一部分存储。requirements 也会被写入模型 Conda 环境 (conda.yaml) 文件的 pip 部分。
extra_pip_requirements –
pip requirement 字符串的可迭代对象（例如 ["pandas", "-r requirements.txt", "-c constraints.txt"]）或本地文件系统上 pip requirements 文件的字符串路径（例如 "requirements.txt"）。如果提供了该参数，它将描述附加到根据用户当前软件环境自动生成的默认 pip requirements 集的附加 pip requirements。requirements 和 constraints 都将自动解析并写入 requirements.txt 和 constraints.txt 文件，并作为模型的一部分存储。requirements 也会被写入模型 Conda 环境 (conda.yaml) 文件的 pip 部分。
警告

以下参数不能同时指定
- conda_env
- pip_requirements
- extra_pip_requirements
此示例演示了如何使用 pip_requirements 和 extra_pip_requirements 指定 pip requirements。
metadata – 传递给模型并存储在 MLmodel 文件中的自定义元数据字典。

示例

from pathlib import Path
from lightgbm import LGBMClassifier
from sklearn import datasets
import mlflow

# Load iris dataset
X, y = datasets.load_iris(return_X_y=True, as_frame=True)

# Initialize our model
model = LGBMClassifier(objective="multiclass", random_state=42)

# Train the model
model.fit(X, y)

# Save the model
path = "model"
mlflow.lightgbm.save_model(model, path)

# Load model for inference
loaded_model = mlflow.lightgbm.load_model(Path.cwd() / path)
print(loaded_model.predict(X[:5]))

输出

[0 0 0 0 0]