在 MLflow Transformers 风味中处理大型模型

警告

本指南中描述的功能适用于熟悉 Transformers 和 MLflow 的高级用户。在使用这些功能之前，请了解其限制和潜在风险。

MLflow Transformers 风味允许您在 MLflow 中追踪各种 Transformers 模型。然而，由于大型模型（例如大型语言模型（LLMs））的大小和内存需求，记录它们可能会消耗大量资源。本指南概述了 MLflow 在记录模型时用于减少内存和磁盘使用的功能，使您能够在资源受限的环境中处理大型模型。

概览

下表总结了使用 Transformers 风味记录模型的不同方法。请注意，每种方法都有一定的限制和要求，具体描述如下面几节所述。

保存方法描述内存使用磁盘使用示例

正常的基于流水线的记录

使用流水线实例或流水线组件字典记录模型。

高

保存方法	描述	内存使用	磁盘使用	示例
正常的基于流水线的记录	使用流水线实例或流水线组件字典记录模型。	高	高	`import mlflow import transformers pipeline = transformers.pipeline( task="text-generation", model="meta-llama/Meta-Llama-3.1-70B", ) with mlflow.start_run(): mlflow.transformers.log_model( transformers_model=pipeline, artifact_path="model", )`
内存高效的模型记录	通过指定本地检查点路径记录模型，避免将模型加载到内存中。	低	高	`import mlflow with mlflow.start_run(): mlflow.transformers.log_model( # Pass a path to local checkpoint as a model transformers_model="/path/to/local/checkpoint", # Task argument is required for this saving mode. task="text-generation", artifact_path="model", )`
存储高效的模型记录	通过保存对 HuggingFace Hub 仓库的引用而非模型权重来记录模型。	高	低	`import mlflow import transformers pipeline = transformers.pipeline( task="text-generation", model="meta-llama/Meta-Llama-3.1-70B", ) with mlflow.start_run(): mlflow.transformers.log_model( transformers_model=pipeline, artifact_path="model", # Set save_pretrained to False to save storage space save_pretrained=False, )`

import mlflow
import transformers

pipeline = transformers.pipeline(
    task="text-generation",
    model="meta-llama/Meta-Llama-3.1-70B",
)

with mlflow.start_run():
    mlflow.transformers.log_model(
        transformers_model=pipeline,
        artifact_path="model",
    )

内存高效的模型记录

通过指定本地检查点路径记录模型，避免将模型加载到内存中。

低

高

import mlflow

with mlflow.start_run():
    mlflow.transformers.log_model(
        # Pass a path to local checkpoint as a model
        transformers_model="/path/to/local/checkpoint",
        # Task argument is required for this saving mode.
        task="text-generation",
        artifact_path="model",
    )

存储高效的模型记录

通过保存对 HuggingFace Hub 仓库的引用而非模型权重来记录模型。

高

低

import mlflow
import transformers

pipeline = transformers.pipeline(
    task="text-generation",
    model="meta-llama/Meta-Llama-3.1-70B",
)

with mlflow.start_run():
    mlflow.transformers.log_model(
        transformers_model=pipeline,
        artifact_path="model",
        # Set save_pretrained to False to save storage space
        save_pretrained=False,
    )

内存高效的模型记录

此方法在 MLflow 2.16.1 中引入，允许您在不将模型加载到内存的情况下记录模型

import mlflow

with mlflow.start_run():
    mlflow.transformers.log_model(
        # Pass a path to local checkpoint as a model to avoid loading the model instance
        transformers_model="path/to/local/checkpoint",
        # Task argument is required for this saving mode.
        task="text-generation",
        artifact_path="model",
    )

在上面的示例中，我们在 mlflow.transformers.log_model() API 中将本地模型检查点/权重的路径作为模型参数传递，而不是传递流水线实例。MLflow 将检查检查点的模型元数据，并在不将模型权重加载到内存的情况下记录它们。通过这种方式，您可以使用最少的计算资源将一个巨大的、包含数十亿参数的模型记录到 MLflow 中。

重要说明

使用此功能时请注意以下要求和限制

检查点目录必须包含有效的 config.json 文件和模型权重文件。如果需要分词器，其状态文件也必须存在于检查点目录中。您可以通过调用 tokenizer.save_pretrained("path/to/local/checkpoint") 方法将分词器状态保存到检查点目录中。
您必须使用模型设计的相应任务名称指定 task 参数。
在此模式下，MLflow 可能无法准确推断模型依赖项。有关管理模型依赖项的更多信息，请参阅管理 MLflow 模型中的依赖项。

警告

确保您指定正确的任务参数，因为不兼容的任务将导致模型在加载时失败。您可以在 HuggingFace Hub 上检查模型的有效任务类型。

存储高效的模型记录

通常，当 MLflow 记录机器学习模型时，会将模型权重保存到 Artifact Store（模型制品仓库）中。然而，当您使用来自 HuggingFace Hub 的预训练模型并且在记录之前无意对模型或其权重进行微调或以其他方式操作时，这不是最优的。对于这种非常常见的情况，在开发 Prompt、测试推理参数等过程中复制（通常非常大的）模型权重是多余的，只不过是额外浪费存储空间。

为了解决此问题，MLflow 2.11.0 在 mlflow.transformers.save_model() 和 mlflow.transformers.log_model() API 中引入了一个新参数 save_pretrained。当此参数设置为 False 时，MLflow 将放弃保存预训练模型的权重，而是存储对 HuggingFace Hub 上底层仓库条目的引用；具体来说，在记录组件或流水线时，会存储仓库名称和模型权重的唯一提交哈希。当加载回此类仅引用模型时，MLflow 将检查保存的元数据中的仓库名称和提交哈希，然后从 HuggingFace Hub 下载模型权重或使用 HuggingFace 本地缓存目录中的本地缓存模型。

以下是使用 save_pretrained 参数记录模型的示例

import transformers

pipeline = transformers.pipeline(
    task="text-generation",
    model="meta-llama/Meta-Llama-3.1-70B",
    torch_dtype="torch.float16",
)

with mlflow.start_run():
    mlflow.transformers.log_model(
        transformers_model=pipeline,
        artifact_path="model",
        # Set save_pretrained to False to save storage space
        save_pretrained=False,
    )

在上面的示例中，MLflow 不会保存 Llama-3.1-70B 模型权重的副本，而是将以下元数据作为对 HuggingFace Hub 模型的引用进行记录。这将节省大约 150GB 的存储空间，并显著减少您在开发期间启动的每次运行的记录延迟。

通过导航到 MLflow UI，您可以看到使用仓库 ID 和提交哈希记录的模型

flavors:
    ...
    transformers:
        source_model_name: meta-llama/Meta-Llama-3.1-70B-Instruct
        source_model_revision: 33101ce6ccc08fa6249c10a543ebfcac65173393
        ...

在生产部署之前，您可能希望持久化模型权重而不是仓库引用。为此，您可以使用 mlflow.transformers.persist_pretrained_model() API 从 HuggingFace Hub 下载模型权重并将其保存到 Artifact Location（模型制品存储位置）。有关更多信息，请参阅 OSS Model Registry 或 Legacy Workspace Model Registry 部分。

注册仅引用模型以用于生产

使用上述任何优化方法记录的模型都是“仅引用”模型，这意味着模型权重未保存到 Artifact Store（模型制品仓库）中，仅保存了对 HuggingFace Hub 仓库的引用。当您正常加载模型时，MLflow 将从 HuggingFace Hub 下载模型权重。

然而，这可能不适用于生产用例，因为模型权重可能不可用或由于网络问题导致下载失败。MLflow 提供了一种解决方案来解决在将仅引用模型注册到 Model Registry（模型注册表）时出现的问题。

Databricks Unity Catalog

将仅引用模型注册到 Databricks Unity Catalog Model Registry 无需比正常模型注册过程额外的步骤。MLflow 会自动下载模型权重并将其与模型元数据一起注册到 Unity Catalog。

import mlflow

mlflow.set_registry_uri("databricks-uc")

# Log the repository ID as a model. The model weight will not be saved to the artifact store
with mlflow.start_run():
    model_info = mlflow.transformers.log_model(
        transformers_model="meta-llama/Meta-Llama-3.1-70B-Instruct",
        artifact_path="model",
    )

# When registering the model to Unity Catalog Model Registry, MLflow will automatically
# persist the model weight files. This may take a several minutes for large models.
mlflow.register_model(model_info.model_uri, "your.model.name")

OSS Model Registry 或 Legacy Workspace Model Registry

对于 Databricks 中的 OSS Model Registry 或 Legacy Workspace Model Registry，您需要在注册模型之前手动将模型权重持久化到 Artifact Store（模型制品仓库）中。您可以使用 mlflow.transformers.persist_pretrained_model() API 从 HuggingFace Hub 下载模型权重并将其保存到 Artifact Location（模型制品存储位置）。该过程不需要重新记录模型，而是有效地就地更新现有模型和元数据。

import mlflow

# Log the repository ID as a model. The model weight will not be saved to the artifact store
with mlflow.start_run():
    model_info = mlflow.transformers.log_model(
        transformers_model="meta-llama/Meta-Llama-3.1-70B-Instruct",
        artifact_path="model",
    )

# Before registering the model to the non-UC model registry, persist the model weight
# from the HuggingFace Hub to the artifact location.
mlflow.transformers.persist_pretrained_model(model_info.model_uri)

# Register the model
mlflow.register_model(model_info.model_uri, "your.model.name")

跳过保存预训练模型权重的注意事项

虽然这些功能对于节省计算资源和存储空间以记录大型模型非常有用，但仍有一些需要注意的事项

模型可用性变化：如果您使用的是其他用户仓库中的模型，该模型可能会在 HuggingFace Hub 中被删除或变为私有。在这种情况下，MLflow 无法重新加载模型。对于生产用例，建议在将模型从开发或 staging 迁移到生产环境之前，将模型权重的副本保存到 Artifact Store（模型制品仓库）中。
HuggingFace Hub 访问：由于网络延迟或 HuggingFace Hub 服务状态，从 HuggingFace Hub 下载模型可能很慢或不稳定。MLflow 不提供任何重试机制或针对从 HuggingFace Hub 下载模型的鲁棒错误处理。因此，对于您的最终生产候选运行，您不应依赖此功能。

通过了解这些方法及其限制，您可以在 MLflow 中有效处理大型 Transformers 模型，同时优化资源使用。

概览​

内存高效的模型记录​

重要说明​

存储高效的模型记录​

注册仅引用模型以用于生产​

Databricks Unity Catalog​

OSS Model Registry 或 Legacy Workspace Model Registry​

跳过保存预训练模型权重的注意事项​

概览

内存高效的模型记录

重要说明

存储高效的模型记录

注册仅引用模型以用于生产

Databricks Unity Catalog

OSS Model Registry 或 Legacy Workspace Model Registry

跳过保存预训练模型权重的注意事项